BPI cluster meeting 4Apr’18

Our BPI cluster meeting on 4th  April will be in Pav. K.16 between 12:30-13:30.

During that session, Joao Paulo Carvalho will be our guest lecturer. Joao is from Instituto Superior Técnico, University of Lisbon, Portugal. He is nowadays on his sabbatical in our group. For detailed information about Joao Paulo Carvalho you can check the following link  https://www.l2f.inesc-id.pt/w/João_Paulo_Carvalho .

The title and the abstract of Joao’s talk is as follows:

Fuzzy Fingerprints: Identification and classification based on top-k values


Fuzzy Fingerprints (FFP) were developed as a technique to allow the identification of an individual out of a large number of suspects based on their usage habits. They were inspired by the fact that many types of data studied in the physical and social sciences can be approximated with a Zipfian distribution, where the frequency of an item is inversely proportional to its rank in the frequency table. Fuzzy Fingerprints efficiently use the implicit information contained in top-K most frequent data values to perform identification in large datasets. The term “fingerprint” is used in the sense that fingerprints are unique, and are usually left unintentionally, allowing us to identify their “owners”.

“Identification” can be seen as a specific classification task where the number of classes is unusually large. Despite being originally used as a “user identification” technique, FFP have been extended to identify and classify from single users to categories, topics or classes, and have shown to be competitive with machine learning techniques even when dealing with a small number of classes ,while exhibiting some interesting properties.

In this talk I will approach the ideas behind Fuzzy Fingerprints and show case studies and applications involving: identification of anonymous users based on their phone and web usage habits; text author identification based on their writing habits; classification and identification in social data (e.g. detecting tweets related to a given trending topic); classification based on medical text data; movie recommendation; etc.

Leave a Reply