An interview with David Ayman Shamma

Author/Interviewee: David Ayman Shamma
Editor/InterviewerMichael Riegler



Describe your journey into computing from your youth up to the present. What foundational lessons did you learn from this journey? Why were you initially attracted to multimedia?

I’ve always been curious about solving problems.  Not so much the answer but actually I like to know how a problem can be broken down into parts, abstracted, and reasoned with—which often drives us to think about abstraction (is there a non-specific instance of this problem), theory (is there some known literature from the mathematical or social sciences that will help us frame what’s happening, and analogy (can we solve this because its structure is like another problem?).  My education included classes in psychology, philosophy, math, and engineering; eventually I realized Computer Science and specifically Artificial Intelligence embodied everything I was looking for: understanding people, modeling problems, and building new systems.

Interestingly enough, as an undergrad I took a job in an art department at the local state college as a technician; my job was to keep their Macs running with Adobe products. While I was there, I was allowed to audit studio art classes.  I began to see how artistic and creative processes were influenced by the tools we have—be it a 1:50 D-76 bath with fiber based paper in a darkroom or masking layers in Photoshop.  This connection between creative and constructive processes carried into my work at NASA’s Center for Mars Exploration where I worked on diagrammatic knowledge tools and then into my Ph.D on community driven Multimedia systems. It was around this time that I saw ACM Multimedia 2004 had a call for technical papers in the Interactive Arts.  Since then I’ve been active in the community, mostly focused on the Arts track but as my work began to include social computing in 2009 I started to think about hybrid social-visual systems.  In 2013, I was the Technical Program Co-chair, and  we started to look critically at the broad technical areas, the review process, and started some inclusion and diversity initiatives.

The main foundational lesson for me is to continue asking the right questions, even if you’re branching stemming out of some smaller, under-represented area or track.  In many cases, you’ll find new exciting research questions.  That said, I found I need to couple this with a personal understanding of the outside domain; only then can a truly functional hybrid system work; it’s not enough to look at divergent sources as just a big bag of the same data—pixels, tags, comments, clicks, they all carry an explicit or tacit semantic implication; respect that.

Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?

My Ph.D. dealt with social computing and community semantics: the objects in a photo carry a broader semantic conversation context of the online site sharing that photo. When I graduated, I joined an industry research lab. I spent 10 years there through a few organizational shifts. In my last 4 years there I founded the HCI Research group with a charter on investigating what our research meant to people.  My group’s research spanned across several domains: multimedia, computer vision, information visualisation, social computing, ethnography, and physical computing; this gave me deep perspective across many areas.  Personally, understanding how things are connected and what those connections meant became a focus of my research.  Data is created for a reason and structured link data can carry a tacit semantic that helps us understand people and tasks in the world. Lately, I’ve been thinking about physical spaces where people interact and create content. What sort of camera do you have on you? How does it change your practice of photography? What sensors might be in your clothes or in the world? These questions have been part of my current focus at Centrum Wiskunde & Informatica.  We’ve been working with a Dutch fashion designer in Amsterdam investigating how fashion and technology can be used in various situational tasks and environments through instrumenting clothing and creating structured data to understand people’s activity and flocking.  What’s exciting beyond the research is connecting goals of a fashion designer and computer science research; it’s an exciting bridge to create. Once all the fabric and sensors are accounted for, it becomes a social computing problem again…that’s where I like to live, creating bridges.

Can you profile your current research, its challenges, opportunities, and implications?

Now more than ever, we are a function of our own data.  Data drives much of computing today, be it data science or machine learning driven.  I like to emphasize how we collect and label data as it has direct consequences on what we can analyze, predict, and create.  For many, this means harvesting data for use.  For me, it means understanding how people act, behave, and communicate through those signals.  For example, at CSCW 2016 I published some work where we looked at the browsing behavior of millions of people on Flickr which we matched into a relatively small set of editorial judgements to surface high quality geo-tagged weather photos.  The alternate approach, which they did attempt at first, was to just train a neural net to find photos of storms or lightning or sunny days. While that’s recall optimistic, the editors were quick to point out everyone takes crummy photos of lightning so conventional approaches didn’t work. My research took a different approach, instead of training generic aesthetics into the system, we modeled a community-centric approach. Using the tacit aesthetic judgments from the Flickr community, we couple the structured link data with CNN to surface high quality photos.It’s not a case of active learning, in fact, it’s a supervised model where that supervision comes from implicit community actions and explicit editorial judgements.  We have some similar work to be published at CHI 2017 later this year where we were surfacing deviant/abuse images on Tumblr; a task that was even harder as the image may not be representative of such behavior, so the social-visual system was a necessity.

Taking you interest in AI and fashion into account, I am wondering what you generally think about the current hype on deep learning and in context to the fashion research. Do you think AI based systems will ever be able to understand context which is an important factor in fashion?

You know, I remember when DeepBlue beat Kasparov back in the 90s and while it was great, I didn’t think much of it as an AI victory (nor did IBM if I recall). The recent win by AlphaGo  is different and something amazing.  I don’t think it’s hype as things work and work well—however we still face many of the same limitations. With regard to fashion, it’s a great time to be excited about AI. I mean we see solutions to many of the older research and fashion issues (like point your camera at someone and find the clothes they are wearing to buy online) but I think smart electronics, AI and fashion is the new sweet spot.  There have been many advancements in textiles like pixel to stitch knitting and small electronics make for a fun new playground for AI, sensors, and IoT. We’re just now starting to explore how clothes and fashion can sense, detect, and respond to people and to the environment.  I get what you’re saying by AI hype and that’s another discussion, but right now I’m excited to build the next generation of wearable tech.

How generalizable is data from sources like Flickr? For example, are your insights on Flickr also valid in non-western countries?

I certainly have had reviewers ask me how generalizable research is because it used Flickr data or Yelp data or Twitter data or whatever; I see it as the hallmark of a bad review.  On one hand, there is no sense to believe that any slice of a specific social media dataset should be generalizable. People act differently on Flickr than they do on Instagram or on Snapchat.  The application/website dictates an interaction, and really that’s what we are studying—as a research community we need to move beyond just studying naive pixels and examine what it’s doing.  Ok, if you’re just looking for indoor vs outdoor shots in Yelp photos, then maybe.  But have you ever tried to find a restaurant in Japan versus Italy versus America? Store fronts look completely different. Internationalization is rarely studied by multimedia researchers and I think multimedia mediated cultural communication is more important than website generalization. 

I think it would be very interesting if you could also answer about what do you think is the role or responsibility of multimedia researchers in context of all the fake news/alternative new debate. Do you think we should focus on it?

In 2009, I began publishing work on doing multimedia summarization from using aggregated Twitter feeds from the Obama McCain debate. Back then, people really really wanted to tweet and it was a narrow interest community.  A few years later, during the Egyptian of 2011, I ran my methods against the Twitter firehose and saw some mis-information (like a bus on fire that was reported which was actually from another country years ago). Delayed information is a systemic problem, where something happened hours or days ago and it gets propagated as fresh information. I don’t believe we had widespread purposeful propagation of misinformation (least not like what we see in today’s world). So today, we have misplaced information, delayed information, fake/alt information and the field of multimedia is ripe to handle this problem. For example, take a fake news story with a photo.  Has the photo been altered to retell a story? Is the photo from a different news story? Are there clusters of other news sources that contradict? There’s a whole world of multimedia problems, many of which large companies are struggling to get a grip on, in finding fake news, but the hard problem will be the explanation. Identifying fake is half of the problem, explaining to people why it’s fake is the other.  News, now more than ever, is highly visual (photos/video) and social; dealing with a plurality of signals is the core of multimedia research.

In this context do you think that fake news are a problem of social network platforms or should newspapers also be investigated?

Can you name a news source that does not rely on social network platforms?  Conversely, have you seen Twitter deliver news?  Their streaming video with tweet interfaces speaks to research we did 10 years back.  I don’t think we can decouple the two, but we’ve seen how social media sites tend to amplify things by propagating clickable content.  So for a news agency, it starts with the title and snippet of a story and it’s related photo.  But then there’s also the face news agencies gaming the social sites.  There’s been some great work from UW cracking the problem, but I think it’s time for multimedia research to step up here as visual content always carries more engagement.

How would you describe the role of women especially in the field of multimedia?

Diversity of all types—gender, nationality, race—is critically important to the future of multimedia research.  When I was on the TPC for Multimedia in 2013 I did some data analytics of the past several years of the conference series; the gender stats were abysmal.  We worked hard to increase the gender diversity in the area chairs and in the conference.  To the former, following some advice from Maria Klawe I heard in a lecture maybe 10 years prior, we pushed on topic diversity for the conference.  The idea here is legacy areas can carry legacy diversity problems; so newer areas (social computing, affect, crowdsourcing, music, etc.) are more likely to have better gender leadership ratios.  It was the correct approach and we doubled the number of women in leadership roles in the ACs but still there was much room to grow.  We coupled this with finding corporate support for a womens & diversity lunch—a practice that I’m happy the conference has continued.  Diversity brings an expanded set of ideas, methods, and approaches in research.  We’ve come a ways since 2013 and I’m very happy to see the 2017 program also similarly expand its diversity but we have a very long way to go to catch up to some other SIGs.

How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

Impact happens where research connects to people. For me, it’s usually revolves around creative practice in multimedia.  How online broadcasters DJing house and hip-hop connect with their audience online and how does it differ from when they are in a club?  If you have an iPad and an iPhone and want to take a picture, when do you reach for the iPad to take the photo?  If you’re posting a photo to Instagram, what filter will you use to enhance the photo?  The most valuable research include method, system, and people. Let’s take that last one as an example.  One could build a prediction model to automatically apply filters based on a training set of what got likes and the types of transformation but would that change people’s creative practice?  We found people enjoyed the process of selection (despite usually picking the same filter over and over again). So the question becomes how do we optimize the experience without hindering it.

In my time as Director of Research at Flickr, we enjoyed looking at the full stack: data, machine learning, engineering, visualization, and all the components that affect people and media experience. We knew there was an advantage to easily dive into 13 billion photos and 100 million people but felt, even inside a corporation, there should be more open data for all researchers.  This lead to the creation of the YFCC100M ( 100 million Creative Commons images in a single dataset for open research.  Beyond the data itself, we found ourselves reviewing small technical Creative Commons details to ensure legal and privacy concerns were met but still opening the data for wide academic and corporate use.  The impact has been incredible.  Outside of the multimedia and computer vision communities, in the first year since release we’ve seen published work using our dataset from the HCI, Data Science, and Visualization communities and even were featured by the Library of Congress.  All driven by the idea to share data we felt was too locked up; fortunately Flickr, Creative Commons, and Yahoo Legal shared our vision and we’ll look to see more impact to come.

Over your distinguished career, what are your top lessons you want to share with the audience?

Really nothing happens in a vacuum. Partnerships and collaborations make things interesting as they make one malleable and push one to think full stack. This is shaped by my 10 years in an industry lab, connecting with academia through hosting interns, collaborative work, and sponsorships really fueled my work.  I’d say still a good 70% of our work was internally driven but that 30% outreach was really valuable.  Now at an academic lab, I’m doing the reverse.  We partnered with a fashion designer to keep connected to their goals and their problems while we think about the wearable and social Internet of Things.  It’s great to think without constraints but really adapting to the real world and thinking end-to-end is a critical driver for me.  At the end of the day, I want to use it. Build what you love and make it real.  This was easier when I was at a corporation, but there are still plenty of ways to collaborate depending on scope. And really think full stack in system and evaluation.  You’ll find yourself evaluating your work on multiple levels from F-1 metrics to Likert scale surveys. What we do is develop new systems and methods but work with real impact will affect applications and design. My favorite research (of mine or others) always critically engages with the bigger picture.

Since you are active researcher in both US and in Europe, what do you think are the main differences? What is positive and what is negative? And what could we learn from each other?

I did a semester sabbatical at the Keio-NUS CUTE center in Singapore a few years back, so it’s not my first dive outside of industry.  I’m reminded in La Nausée Sartre wrote that anyplace you live feels the same after two weeks; the idea being once you get back to job and life, it becomes the same again. I can’t say I quite agree in this case. The move from an industry lab in California to an academic one in the Netherlands was a bit of a culture and cadence shift.  After almost a year, it’s clear to me that it’s the pace as we share research culture.  We tend to sprint constantly in industry and the sprinting seems to come and go in the academic. Each style has it’s pros and cons; there’s been times I wanted everyone to be running and times I was happy I could dive into something because we weren’t running. I don’t think it’s something to enumerate positive and negative points, just a different state of being.  I’m not sure why I gave you an existential response either.


About David Ayman Shamma:

I am a Principal Investigator and Senior Scientist at Centrum Wiskunde & Informatica (CWI) where I lead a team looking at Social Computing, Internet of Things (IoT), and fashion. Formerly, I was Director of Research at Yahoo Labs where I ran the HCI Research Group and I was the scientific liaison to Flickr (where I co-founded the Data-science group there). Broadly speaking, I design and prototype systems for multimedia-mediated communication, as well as, develops targeted methods and metrics for understanding how people communicate online in small environments and at web scale. Additionally, I create media art installations that have been reviewed by The New York Times, International Herald Tribune, and Chicago Magazine and exhibited internationally, including Second City Chicago, the Berkeley Art Museum, SIGGRAPH ETECH, Chicago Improv Festival, and Wired NextFest/NextMusic.

I have a Ph.D. in Computer Science from the Intelligent Information Laboratory at Northwestern University and a B.S./M.S. from the Institute for Human and Machine Cognition at The University of West Florida. Before Yahoo!, I was an instructor at the Medill School of Journalism; I have also taught courses in Computer Science and Studio Art departments. Prior to receiving my Ph.D., I was a visiting research scientist for the Center for Mars Exploration at NASA Ames Research Center.

Michael Alexander Riegler: 

Michael is a scientific researcher at Simula Research Laboratory. He received his Master’s degree from Klagenfurt University with distinction and finished his PhD at the University of Oslo in two and a half years. His PhD thesis topic was efficient processing of medical multimedia workloads.

His research interests are medical multimedia data analysis and understanding, image processing, image retrieval, parallel processing, gamification and serious games, crowdsourcing, social computing and user intentions. Furthermore, he is involved in several initiatives like the MediaEval Benchmarking initiative for Multimedia Evaluation, which runs this year the Medico task (automatic analysis of colonoscopy videos)footnote{}.

Bookmark the permalink.