From Internet Memes to Scientific Research: Creating a Novel Image-to-Image Translation Model

How a former physicist (turned data scientist) applied deep learning models to study metamaterials.

Not sure what career in data is for you?

When Adrien Saremi, a Ph.D. candidate at the Georgia Institute of Technology, enrolled in a data science program last year, he wasn’t sure whether his physics degree would help or hinder his transition. To his surprise, Saremi found his knowledge of physics gave him a competitive advantage over the other students in the data science course.

When it came time for Saremi to pick his capstone project, he zeroed in on metamaterials, the topic he’d dedicated his academic life to studying. Metamaterials are materials with synthetically engineered properties that are increasingly used in electronic, optical, and mechanical applications. Scientists can program metamaterials to control which regions absorb or reflect light, leading to experiments involving the creation of electronic displays that use almost no power but can still display full-color images. Theoretically, this means a Harry Potter-style invisibility cloak could conceivably be created from metamaterials—in fact, the US Army is reportedly funding research into “cloak-like structures that can steer mechanical wave energy around objects, protecting [soldiers] from blasts, shockwaves, earthquakes or vibration.”

Creating a project that could marry data science with physical science and be accessible outside of an academic context was not an easy task. While Saremi was searching for a way to tackle the problem, a chance encounter with Google’s Deep Dream generator gave him the answer.

The online platform generates abstract visual art using a convolutional neural network, a deep learning algorithm most commonly used to analyze visual imagery. Computer-generated art is nothing new, but the machine learning models that support it have a vast and mostly uncharted potential to upend entire industries.

One of the most popular techniques of generating art using machine learning is called image-to-image translation. Currently, image-to-image translation is mostly used in the service of entertainment. Take Snapchat’s face swap filter, for example, in which users overlay images of their own face with that of their pet or a celebrity. Pinterest has an entire page dedicated to “Face Swap Fails” which include rather disturbing images of grown men swapping faces with their babies, a Teletubby doll, and even a burger. Call it surrealist art from the internet age, if you will. There is also the viral video of actor Henry Cavill’s mustache edited out of Warner. Bros’ “Justice League” using only a $500 computer and an AI algorithm, which some said did a better job than the studio’s CGI department.

Image-to-image translation can also be used for more nefarious purposes, like generating deepfakes, photorealistic fake videos created using AI. In 2019, two artists put out a deepfake of Facebook founder Mark Zuckerberg bragging about exerting “total control” over billions of people’s personal data.

Courses How it Works Mentors Student Success Blog

This is the problem Saremi’s data science capstone project attempts to solve. Saremi created an image-to-image translation model that automatically maps the points found in microscopic metamaterials and characterizes the type of bond connectivity according to three of the most commonly encountered bond structures— kagome, square, and triangular lattices. This allows scientists to quickly categorize their mechanical properties and manipulate them even at a microscopic level. “If you’re only able to capture the points through a microscope, you want to be able to have an automatic program that will connect the dots together,” Saremi said. “But they can’t be connected randomly—they need to follow a pattern.”

While Saremi’s project focuses on a niche area of physics, the ability to automatically connect points according to predetermined patterns or models has widespread potential use cases. For example, predicting GDP according to historical trend lines or analyzing the kinetic energy density within a molecule.

Some of these applications are already in use today. Google used a similar technique of image-to-image translation to stitch together satellite images to create the Google Maps application used by over one billion people around the world every month.

Training a machine learning algorithm using a proprietary dataset

To train his model to recognize common bond structures, Saremi used a Generative Adversarial Network (GAN), a type of unsupervised machine learning algorithm that discovers patterns in input data and learns to generate new examples derived from it.

This model is used in image-to-image translation, which takes images from one domain and transforms them so they have the style (or characteristics) of images from another domain, such as a photo of the Eiffel Tower rendered à la Van Gogh’s “Starry Night.”

The GAN consists of two machine learning models: a generator and a discriminator.

The generator attempts to generate new examples or derivations of the input data that can pass for real data;
The discriminator classifies these outputs as real or fake

“The generative model is constantly learning to draw a bit better because of the feedback the discriminator is giving it, while the discriminator is constantly getting a little better at classifying because the drawings [become more sophisticated],” said Allen, the data scientist who mentored Saremi throughout his course.

Saremi trained his GAN model to learn the three common bond structures of metamaterials—kagome, triangular, and square lattices—and automatically apply the bond connections to new images. Images of metamaterial bonds aren’t widely available, so creating a dataset wasn’t a simple matter of web scraping Google Images or Flickr. Instead, Saremi generated his own training data by creating hundreds of images of metamaterial bond structures to train his model from scratch.

Unfortunately, it’s not as simple as drawing the same lattice hundreds of times. There’s a degree of disorder—even the square lattices aren’t perfectly square—so Saremi had to introduce an element of randomness. To generate the images, he used Mathematica, a modern technical computing system used by physics researchers, mathematicians, and data scientists.

“It is a pretty unusual thing for someone to be generating their own data for a project like this. Most people just use datasets that are available on the internet,” said Allen, who was impressed by Saremi’s ability to combine his data science skills with his deep domain expertise in physics. “But Adrien assembled a very domain-specific dataset and used a cutting-edge machine learning technique.”

Instead of positioning his project as a saleable tool for the physics research community, Saremi says it’s more of a proof-of-concept that can open people’s minds to the various ways GAN can be used—beyond just removing mustaches. In fact, the model can be trained on more than just image data; it can also take sets of numbers and generate numerical predictions, like an economic forecast.

“The motivation for this project was a physics problem, but you can sell it based on the need to identify points and connect them, which is an image classification problem,” said Saremi. “You could technically use the same model architecture for another type of physics problem or even an economics problem where you’re trying to forecast something.”

Looking ahead

Three months after graduating from Springboard’s data science course, Saremi is just weeks into a new job as a data scientist at Cognira, an analytics consulting firm that helps make AI and machine learning more accessible to retailers.

Saremi’s project has already generated interest in the data science community. After he wrote a post on Medium explaining the project in detail and how he built his GAN model, the article was republished by Towards Data Science, a well-known publication in the data science community.

Even if he doesn’t build on his project or try to commercialize it, Saremi hopes deep learning methods like GAN will be more widely used and appreciated to solve important problems in science as well as business. Audio applications like speech recognition and sound transfer could benefit especially from the model’s generative features, such as transforming a piece of music from classical to jazz or vice versa. Given that the model can take input data in almost any form—images, numbers and even audio—GAN offers potentially limitless applications.

“There are many more applications of such deep learning methods aimed toward image generation algorithms—the generation of road maps from satellite images, for example, or the restoration of paintings, photography enhancement, and so on,” Saremi said.

In the end, Saremi’s physics background gave him a distinct advantage in his transition to data science, allowing him to utilize his scientific knowledge and experience to create a capstone project that has won him accolades throughout the industry. Saremi hopes his experience will inspire others with a background in science to take the leap.

“I think in the end, this work was the perfect illustration that the study of physical systems exposes people to other fields of study that are more practical.”

About the authors

Dr. Adrien Saremi is a recent P.h.D. graduate from Georgia Institute of Technology, where he specialized in researching how to formulate a mathematical framework to study and quantify the mechanical response in metamaterials. After completing Springboard’s Data Science Career Track, he landed a job at Cognira as a data scientist, where he provides AI solutions to enterprise retailers.

Lucas Allen is a Springboard mentor and senior data scientist at Red Ventures, where he leads data science for Red Ventures’ Higher Education verticals. He works primarily on revenue optimization problems for a business with more than 100 sites and college partners.

Adrien’s project was completed and published through Springboard’s School of Data (SoDA), which offers mentor-led training in data science, data analytics, data engineering, and machine learning engineering.

Saremi wondered, ‘What if I used image-to-image translation to do something more constructive, like apply it to fundamental scientific research?’ Saremi decided to pair a convolutional neural network with metamaterials to do just that. “The study of metamaterials is recent and I feel like this project was a great way to expose the public to these new exciting systems,” Saremi said.

Putting image-to-image translation to good use

Over the course of four weeks, Saremi enlisted the help of Lucas Allen, a data scientist at Red Ventures and Saremi’s mentor in Springboard’s data science bootcamp. Together, they refined an idea that went from designing a model that would calculate a single number to solve a physics problem to using an up-and-coming machine learning method to help physicists better understand metamaterials and demonstrate how image-to-image translation could be used for more than just cheap amusement.

When researching metamaterials, scientists first characterize the material’s bond structure so they can better understand its mechanical properties. They can then manipulate those properties to design the desired chemical structure and composition using CAD (computer-aided design) software. Say you want to design a helmet, for example. You want the inside of the helmet to be soft and the outside of the helmet to be rigid. Metamaterials allow scientists to design a system that only uses one type of material that exhibits both properties. Think of the T-1000 from “Terminator 2,” for example, who is made of liquid metal and can slip underneath doors by turning into liquid form or form its hands into stabbing blades. Metamaterials allow scientists to take a step in turning these science fiction dreams into reality: the US Army is developing self-healing, shape-shifting drones inspired by the character. These drones are made from polymers, materials composed of long, repeating chains of molecules which have a dynamic bond that allows them to go from liquid to solid many times.

However, these bond structures are microscopic, making it difficult for scientists to determine bond connectivity and to categorize a material as soft or rigid. What’s interesting about metamaterials is they don’t necessarily exhibit a uniform structure, unlike a slab of concrete or an elastic band. Sometimes the points (or nodes) are easy to see, but a scientist has to infer what type of bond structure holds them together. While these bonds occur at the molecular level, the process is somewhat analogous to forensic facial reconstruction, where anthropologists recreate a person’s facial features by studying various points of the skull. Distance between the eye orbits, shape of the nasal bones and the chin’s form indicate the depths of tissue to be added to the skull.

Terms

Privacy

Conduct

CAREER TRACKS

RESOURCES

ABOUT US

GET SOCIAL

SCHOLARSHIPS

Data Science Bootcamp

Machine Learning Bootcamp

Data Analytics Bootcamp

Data Engineering Bootcamp