Data Science Seminar by Joao Paulo Carvalho – University of Lisbon

You are cordially invited to the Data Science Seminar that will take place on Friday, 28th October 2016 (TU/e AUD2).

Speaker: Joao Paulo Carvalho, INESC ID’s Spoken Language Systems Laboratory
Title: Fuzzy Fingerprints: Identification and classification in Big data using top-k values

Abstract: Fuzzy Fingerprints are a recently introduced technique inspired by the fact that many types of data studied in the physical and social sciences can be approximated with a Zipfian distribution. I.e., the frequency of an item is inversely proportional to its rank in the frequency table. Fuzzy Fingerprints efficiently use the implicit information contained in top-k most frequent data values to perform identification in large datasets. The term “fingerprint” is used in the sense that fingerprints are unique, and are usually left unintentionally, allowing us to identify their “owners”. The fingerprint concept can be extended from single users to categories, topics or classes, allowing us to perform tasks such as classification and recommendation.

In this talk I will approach the ideas behind Fuzzy Fingerprints and show case studies and applications involving: identification of anonymous users based on their phone and web usage habits; text author identification based on their writing habits; classification and identification in social data (e.g. detecting tweets related to a given trending topic); classification based on medical text data; movie recommendation; etc.

