Está en la página 1de 4
Audio Visual Emotion Based Music Player Prof. Minal P.Nerkar, Surykant Wankhade, Neha Chhajed nerkar.minal@gmail.com ,

Audio Visual Emotion Based Music Player

Prof. Minal P.Nerkar, Surykant Wankhade, Neha Chhajed nerkar.minal@gmail.com, surykantw10@2gmail.com, Nehachhajed12317@gmail.com Computer Department AISSMS IOIT, Pune

ABSTRACT

This project puts forth a framework for real time face and voice recognition and related emotion detection system based on facial features, their actions and intensity of voice. The key elements of Face are considered for prediction of face emotions of the user. The dierence in each facial feature which are in-variant to scaling as well as rotation are used to determine the dierent emotions of face. Machine learning algorithms are used for recognition and classication of dierent classes of face emotions by providing training of dierent set of images. In this context, by implementing herein algorithms would contribute in several areas of identication and many real- world problems. The proposed algorithm is implemented using open source computer vision (OpenCV).

Keywords

Open Source Computer Vision (OpenCV); object recognition; Sound Meter;

1.

INTRODUCTION

 

When

machine

analysis

was commenced on

human cases, it noticed great progress. Facial expression and speech are two main aspects of human emotional expressions, since people mostly rely on facial expression and speech to understand some one’s present state. In recent years, recognizing human emotions from facial expression and speech i.e., audio visual emotion recognition has increased, attracting extensive attention towards artificial intelligence. As Audio Visual emotion recognition is motivated for establishing a reliable and less complex relationship between humans and computers, this increases its importance to human computer interaction (HCI). This research mainly focuses to ease human work by increasing interaction

between humans and computers which in-turn helps to increase use of computers in day-to-day work. As computers have become an important asset of our speedy life, need for meaningful and easy communication between computers and humans has also increased. In addition to this, speech recognition is also successfully established area of research, but its main limitation is that it cannot respond appropriately to emotions of dierent people. To overcome this drawback, many computers are developed that are capable to detect, understand and reply to multiple emotional states of various people similar to how human being does. Hence, this audio-emotion recognition (AER) is a latest field of study which is providing a great advancement in the field of Human- Computer Interaction (HCI).

2. PREVIOUS WORKS

Most works on machine recognition of human emotions from facial expressions or emotional speech use only a single modality, either speech or video, as the data. Relatively little work has been done in combining the two modalities for the machine to recognize human emotions. In this paper, we investigate the integration of both the visual and acoustic information for machine to recognize apparent emotions which may or may not be the true emotion of the human user. We assume the user is willingly showing his /her emotions through the facial expression and in the speech as a means to communicate. There are some systems which require manual selection of current emotion from list of predefined emotions .Websites like stereomood, it lacks capabilities in the sense that the user needs to type in what he is feeling, rather than using computer vision to determine his emotion.

Volume 3 Issue 2 April - 2018

121

Similarly an android application named pindrop also provides predefined emotion and users require to select

Similarly an android application named pindrop also provides predefined emotion and users require to select one the available emotions. These applications are dynamically updatable to latest songs but lacks in determining exact emotions of users using computer vision. To solve the problem of emotion recognition a lot of work has been done in the past. To extract and determine the emotion of a user, we need to extract features from an image and use them against a trained data set to classify the input and determine the emotion

3. PROPOSED SYSTEM

System architecture

The system architecture for the proposed system is given in fig 3.1 [1]. The input image is loaded into the system in .jpg format. Then each image undergoes preprocessing i.e. removal of unwanted information like background color, illumination and resizing of the images. Then the required features are extracted from the image and stored as useful information. These features are later added to the classifier where the expression is recognized with the help Of Scale Invariant Feature Detection (SIFT) algorithm. Minimum the value of the distance calculated, the nearest the match will be found. Finally, a music track will be played based on the emotion detected of the user

will be played based on the emotion detected of the user Fig 1.Basic steps Basic Steps

Fig 1.Basic steps

Basic Steps for web module

A. Image Acquisition

In any of the image processing techniques, the first task is to acquire the image from the source. These images can be acquired either through camera or through standard data sets

that are available online. The images should be in .jpg format. We have used our own data set for real time emotion detection

have used our own data set for real time emotion detection B. Pre-processing Scale (SIFT) algorithm

B. Pre-processing

Scale

(SIFT)

algorithm is used for Pre-processing which is mainly done to eliminate the unwanted information from the image acquired and fix some values for it, so that the value remains same throughout. In the pre-processing phase, the images are converted from RGB to Gray- scale and are resized to 256*256 pixels. The images considered are in .jpg format, any other formats will not be considered for further processing. During pre-processing, eyes, eyebrow and mouth are considered to be the region of interest.

Invariant

Feature

Detection

C.

Facial Feature Extraction

 

After

pre-processing,

the

next

step

in

Scale

Invariant Feature Detection (SIFT) algorithm is feature extraction. The extracted facial features are

stored as the useful information.

facial features are stored as the useful information. The following facial features can be considered “Mouth,

The following facial features can be considered “Mouth, forehead, eyes complexion of skin, cheek and chin dimple, eyebrows, nose and wrinkles on

Volume 3 Issue 2 April - 2018

122

the face”. In this work, eyes, eyebrow, mouth are considered for feature extraction purpose for

the face”. In this work, eyes, eyebrow, mouth are considered for feature extraction purpose for the reason that these depict the most appealing expressions. With the mouth being opened or eyebrow being raised one can easily recognize that the person is either surprised or is fearful. But with a person’s complexion it can never be depicted.

Basic steps for android module:

A. Detection Detection of sound is done using sound meter in android phone

.
.

B. Classification

a

classification algorithm but it is used to classify data using training examples. Regardless of the fact that the target class is multi-modal, the algorithm can in any case lead to great precision. Major disadvantage of the KNN algorithm is that it utilizes every feature similarly in computing a part of processing for similitude. Accuracy of k-NN is kept high in most of the cases. But as size of dataset increases we can see accuracy of both system decreases. But here shows some result and we must say in overall accuracy KNN has more effective work. But as size of dataset increases we can see time consumed for predicting values of KNN system increases.

k-nearest

neighbor

algorithm

is

also

4. TEST CASES

k-nearest neighbor algorithm is also 4. TEST CASES 5. ACKNOWLEDGMENTS We would like to express gratitude

5.

ACKNOWLEDGMENTS

We would like to express gratitude towards our project guide Prof. Minal P. Nerkar for her expert advice and encouragement throughout this difficult project, as well as project coordinator Dr. K. S. Wagh and Head of Department Prof. S. N. Zaware. Without their continuous support and encouragement this project might not have been possible.

6.

REFERENCES

[1] Robust Object Detection and Tracking Using Sift Algorithm Shweta Yakkali*, Vishakha Nara, Neelam Tikone, Darshan Ingle SIESGST, Mumbai University Maharashtra, India

[2]A New Signal Classification Technique by Means of Genetic Algorithms and kNN Daniel Rivero, Enrique Fernandez-Blanco, Julian Dorado, Alejandro Pazos Department of Information and Communications Technologies University of A Corua A Corua, Spain

[3] Comparing Accuracy of K-Nearest-Neighbor and Support-Vector-Machines for Age Estimation

[4]AlexandreAlahi, Raphael Ortiz, Pierre Vandergheynst. FREAK: Fast Retina Keypoint. In

Volume 3 Issue 2 April - 2018

123

IEEE Conference on Computer Vision and Pattern Recognition, 2012. [5]D.G. Lowe, Object Recognition from Local

IEEE Conference on Computer Vision and Pattern Recognition, 2012.

[5]D.G. Lowe, Object Recognition from Local Scale-Invariant Features , Proc. Seventh Intl Conf. Computer Vision, pp 1150-1157, 1999

[6]Expressions Invariant Face Recognition Using SURF and Gabor Features Barun Kumar Bairagi ARC India Pvt. Ltd. Kolkata, India .

[7]Z. Zeng and T.S. Huang are with the Beckman Institute, University of Illinois at Urbana- Champaign, 405 N. Mathews Ave., Urbana,

61801.

[8] Recognition System using Parallel Classifiers and Audio Feature Analyzer, in 2011 3rd Int. Conf. on Computational Intell, Modelling Simulation, 2011, pp. 210-215.

[9] Review, European Journal for Scientific Research, vol. 33, no. 3, 2009, pp. 480-501.

[10]Expression Analysis in Determining Emotional Valence and Intensity with Benefit for Human Space Flight Studies,5th IEEE International Conference on E-Health and Bioengineering - EHB, pp:1 - 4, November 2015

Volume 3 Issue 2 April - 2018

124