Revolutionizing Photography With an AI-Based Image Classifier

Not sure what career in data is for you? 

Take the quizCoursesHow it WorksMentorsStudent SuccessBlog

Copyright 2021

CAREER TRACKS

RESOURCES

ABOUT US

GET SOCIAL

SCHOLARSHIPS

springboard medium
springboard instagram
springboard twitter
springboard linkedin
springboard facebook

ADDITIONAL SPRINGBOARD COURSES

Lana Palmer is a storyteller with experience in photography, theater, and music production. She runs a small theater company, Bread and Butter Theatre, where she and her husband produce new and classic works. Palmer also holds an MBA from the Queen Mary University of London.

Blake Arensdorf is a data scientist at Amazon focusing on people analytics and is deeply passionate about mentoring other would-be data scientists. He holds a B.S. in Applied Mathematics from the University of Colorado at Boulder, and is an ICF Associate Certified Coach.

About the Authors

Palmer's project was completed and published through Springboard’s School of Data (SoDA), which offers mentor-led training in data science, data analytics, data engineering, and machine learning engineering

Before deciding to create her own image classifier, Palmer searched for this seemingly obvious feature in the photography software applications she’d been using for years. But even Adobe’s Lightroom, regarded as the gold standard of photo editing software, lacked this type of automation. After creating her image classifier, Palmer actually applied for a data science job at Adobe; she is waiting to hear back. “I think Adobe LightRoom could really benefit from this feature I’m working on, so I wanted to see if they would hire me,” she said.

AI photography for the selfie generation

Computational photography is the practice of using digital software to enhance photography in real-time or in post-processing, such as a smartphone camera lens that automatically blurs the background when a human face is detected, or a photo library app that uses object classification to tag photos and turn a disorderly archive into a searchable database. The market for AI in photography was worth $10.7 billion in 2019 and is expected to reach over $29 billion by 2024. 

While Palmer’s current iteration of EyeSpy is considered a post-processing tool, she ultimately envisions it as a feature embedded within a smartphone camera that notifies the user in real-time when one or more subjects are caught with partially closed eyes so they can immediately delete the photo or reshoot.

Most people could describe Lana Palmer’s career path as unorthodox. A music producer who has worked on over a hundred TV shows and commercials for CBS, Fox, NBC, and MTV, Palmer also maintained a successful side hustle as a photographer, taking headshots for film and theater actors. When the pandemic hit, Palmer’s inbox began filling up with news of project cancellations and suspended seasons, pushing her to reevaluate her career options. Not one to shy away from a challenge, Palmer once again did something unexpected: she turned to data science.

“Back in 2011 when I first became interested in data science, I felt like I could either be a data scientist or an artist—I couldn't be both,” Palmer said. “I now know that being skilled in one area doesn’t preclude me from building skills in another.”

Photography and data science might not seem like the easiest disciplines to combine, but Palmer found a way. She enrolled in Springboard School of Data's Data Science Career Track and used her experience as a photographer to create an inspired capstone project: an image classifier called EyeSpy that automatically weeds out unflattering photos of a subject.

“If I caught someone blinking or while they were saying something, it’s not flattering and I don’t want them to have to see those photos,” Palmer said. A typical headshot photo session yields anywhere from 300-600 photos, many of which get scrapped. “It’s a very vulnerable thing for people to see images of themselves—that’s something I’ve come to learn—and [the number of photos] can be very overwhelming.” 

“Say you’re trying to take a family photo with five people in it, and two days later you look at it and you realize someone’s eyes are closed but you just didn’t know at the time,” Palmer explained. “There should be a notification that pops up and says hold on, someone was blinking.”

Add the complexity of taking photos with babies or very young children, and it can be extremely difficult to get the perfect shot where everyone’s eyes are focused on the camera.

Aside from helping photographers process images faster, Palmer’s EyeSpy also has very clear consumer appeal. Social media influencers snap anywhere between 20 to 200 photographs per session to get an Instagram-worthy shot. “People are starting to gravitate towards influencers who are authentic rather than those who seem to have a perfect life or always take the perfect photo,” said Nabila Gardena Putri, a social media influencer from Indonesia who has over 583,000 followers on Instagram. “But there’s still a lot of curating you have to do—and that’s what you don’t see on the Instagram page.” 

An image classification algorithm like Palmer’s can also have a lot of uses outside photography. An image can be categorized as ‘usable’ or ‘unusable’ based on virtually any predefined criteria. For example, radiologists take multiple MRI scans in a procedure that lasts anywhere from 15-90 minutes. Sometimes, the scans come out blurry if the patient doesn’t hold completely still. Having an image classification algorithm detect poor image quality could reduce the amount of time patients spend in the MRI scanner—a difficult experience even for those who aren’t typically claustrophobic. 

"The same model concept used here could be expanded to anything a photographer might want to detect in their images such as such as bad lighting, or even to applications outside of photography from medical imaging—such as determining whether a radiology image is usable or not—to satellite imaging, such as classifying structures as in good condition or not following a natural disaster,” said Blake Arensdorf, a senior research scientist in people data science at Amazon and Palmer’s mentor during her Springboard data science program.

Image classification has also been used on video footage, though not as widely. When this technique is applied to movies, it’s known as scene detection. Now a senior machine learning engineer at Netflix, Amir Ziai wrote a thesis while studying at Stanford on how to use a binary classifier to detect and extract kissing scenes in films to help automate rating assignments (e.g.: classifying a movie as PG-13 versus R). In a more general context, “accurate scene detectors can be used to enrich the video’s metadata with scene types and segments that can be easily searched and retrieved by users,” Ziai wrote. Similar to the way automated photo tagging in libraries like Google Photos helps users organize, find and retrieve their images, video tagging based on scene detection can help improve the searchability of individual movies and TV shows in a vast library like Netflix, which has nearly 14,000 titles. 

How Palmer built a convolutional neural network to differentiate between usable and unusable images

Determining whether or not the subject of a photograph is blinking is an image classification problem (‘blink’ versus ‘no blink'). Palmer decided to create a convolutional neural network (CNN), a type of deep learning neural network commonly used to sort images into one or more categories. Image classification is the process of taking an input (like an image) and outputting a class, or a probability that the input belongs to a particular class. 

The network learns what features to look out for as it trains on a set of images, rather than a separate algorithm having to be written for each filter. For example, if the model is shown enough images of cats, it will learn to recognize cats. This is a type of supervised learning, where a model trains using data that is manually labeled by humans so it can learn to recognize objects for itself.

The network learns what features to consider as it trains on a set of images, rather than the features being manually specified. For example, if a CNN is shown enough images of cats labeled 'CAT' and images of dogs labeled 'DOG', it figures out on its own what specific characteristics distinguish cats from dogs. In traditional machine learning techniques such as logistic regression, a human would have to state exactly what characteristics the algorithm should consider, such as the length of whiskers or shape of ears.

CNNs learn feature detection through tens or hundreds of hidden layers. Each layer increases the complexity of the learned features. Palmer built her initial model using the Gradient Machine Learning Platform on Paperspace, a cloud computing platform that can be used to train and develop ML models of any size and complexity. The model was a three-layer sequential model, which is a type of machine learning model that can generate a sequence of values by analyzing a series of input values as part of a continuum, rather than regarding each input as a discrete value. For example, a model fed with time series data on a cancer patient can determine whether the disease is progressing or regressing. This was important for Palmer’s model because the process of labeling an image as ‘blinking’ versus ‘non-blinking’ isn’t always clear-cut.

“There were some edge cases where their eyes were partially closed, and I had to make some decisions about what I felt was a blink,” Palmer explained. “If I couldn’t see the whites of their eyes and their pupil, that would be considered a blink.”

Palmer generated her own image dataset using her collection of archived client images, manually labeling them and splitting them into ‘blinking’ and ‘non-blinking’ groups, with 500 photos in each set. The variance of eye shapes between people of different gender, race, and age also made classification more complicated. For this reason, Palmer wanted to ensure that she curated a demographically diverse dataset to reduce model bias—a common problem in facial recognition systems used by law enforcement agencies, which disproportionately mischaracterize subjects who are female, Black and 18-30 years old.  

The approximate demographic breakdown for Palmer’s dataset is as follows:

  • Gender: Female 54%; Male 43%; Non-binary 3%

  • Race: White 60%; Black 27%; Asian 8%; Other 3%

  • Age: <25 (8%); 25-50 (49%); 50+ (38%)

Once Palmer had built the initial model, she tuned the model’s hyperparameters using the Keras Tuner in TensorFlow by adding and subtracting filters and convolutional layers (a linear operation that applies the filter to an input) until she found the best model with the right combination of layers and filters. The process was relatively straightforward, she said, but sometimes the code would break.

“Most of the calls I had with Blake was him helping me troubleshoot something or debug something that should have worked but wasn’t actually working,” Palmer said of her weekly discussions with her mentor.

When it comes to building a classification model, the dataset is split into two parts: 

  • training data; and 

  • testing data

The training dataset consists of data that is used to train the model, while the testing dataset is held back from the model during training, and is used to validate the model’s accuracy. Test accuracy is measured against the training accuracy to see how well the model classifies objects based on input data it has never encountered before.

The final CNN was able to classify blinks versus non-blinks with 79% accuracy on the testing data set and 86% accuracy on the training dataset. When it comes to a classification problem, the ideal score is one that approaches 100% accuracy, although achieving perfect accuracy is not realistic and could indicate model overfitting. 

“It was a bit of trial and error and tweaking to get it to that point,” she said. “I know there’s definitely more work I can do on this project to get the performance up even higher.”

Turning a capstone project into actual software

Palmer is optimistic that her photography business will pick back up again once the pandemic begins to ease. When that happens, she plans to start using her image classifier in her work instead of hiring an assistant to sort images for her. Asked if she had managed to design an algorithm that could replace a human, she said, “Yes, at least for the initial model—because it’s something that’s so simple, right? All I’m doing is allowing the model to be a set of eyes for me to filter out [photos].” 

In fact, Palmer sees so much potential in her model that she plans to continue working on it even after she graduates from her Springboard data science program. By adding more filters and exposing the model to more training data, she can train it to detect a greater variety of features that can be used to classify images as desirable versus undesirable. 

“The technology of the future is in automation, having people do more, in less time,” said Brandon Groce, a UX designer and content creator with over 104,000 followers on Instagram. “I believe this technology is applicable to those even outside of the influencer world, allowing us to automatically, if we choose, to keep the moments in our camera rolls we want, and automatically discard the moments we don’t.” 

Ultimately, Palmer envisions building an unsupervised learning model that gleans a user’s preferences over time based on which photos they tend to eliminate and those they keep. This “virtual photo assistant” could be in the form of a macOS app for the iPhone or an add-on feature for a smartphone camera. “If you were able to learn from that, you could make this really targeted piece of software that would immediately filter out the photos that the user doesn’t like,” she explained.

For example, universally unflattering features like one eye appearing to be smaller than the other, or the impression of a double chin where there isn’t one simply because of a poor camera angle. “Everyone has things they don’t like about themselves—that’s one of the things I’ve learned as a photographer,” Palmer said. “People will say, I hate this eye, it always looks a little weird, and I’ll say ‘It looks normal, I don’t see it.’ But for them, it’s something they’re very particular about.”

Palmer plans to create an actual prototype of a mobile app or software application based on her image classifier so she can train the model further before monetizing it on the app store or licensing it to a software company. 

“I could allow other people to use it for free in exchange for getting to collect and learn from the photos they add to it to build up the dataset that way,” she said. “With these kinds of projects, the bigger your dataset, the more accurate the model will ultimately be.”