With most of Rover’s Data Science Platform team out for the holidays, I had the chance to tackle a project we’d been neglecting in favor of more business-critical tasks: Trying out the AWS DeepLens, Amazon’s machine learning video camera. The basic concept is simple. First, choose a machine learning model, which is then deployed to the device. Ready-made options include a model for detecting hot dogs or, perhaps more usefully, faces. Then, an AWS Lambda function processes each frame of the video feed and performs model inference on the image data. The inference stream containing the results is then published to AWS Internet of Things (IoT), which is easily viewable in the AWS console. As the sample projects come equipped with pre-written Lambda functions, this can theoretically be done with just a few simple clicks and configurations.
But I’m getting ahead of myself. Before the DeepLens fell into my possession, it had already been opened and configured several months prior. In the course of setting up the device, I realized that a password had been set, and—after an unfortunately-timed hard drive wipe—lost. Theoretically, one could use the camera without ever needing its password. The device runs Ubuntu, and the password is only needed to log onto the machine itself. That said, I had a sneaking suspicion that direct access to the device would be essential to doing anything interesting. Spoiler alert: I was right. So, I set about performing a factory reset. With physical access to the device, I assumed this would be relatively straightforward.
I was wrong. Thus began a string of requests to our IT department, each more baffling than the last. Getting onto the device requires either ssh or a wired connection to a monitor. Since I had no password, ssh was out of the question. My request for an HDMI to micro-HDMI cable was met with perplexed stares, so I resorted to ordering one on Prime Now. A factory reset requires a 32GB USB drive, partitioned with a bootable Ubuntu ISO image on one part and the reset script on another. A bit weird, perhaps, but doable—until I realized that the second partition requires an NTFS filesystem, which is read-only to the Macs that power our site. I was five minutes from commandeering a Windows machine from the Legal team upstairs when I stumbled upon a solution. I used the bootable partition to restart the device and log on, then reformatted the second partition using the DeepLens itself. One restart later and things were good to go, albeit after a full day of unexpected headaches.
I was eager to try out one of the sample projects provided by AWS, so I immediately deployed the ‘Cat or Dog’ model to the camera, hoping to determine that all of the animals in the office were, in fact, dogs. And indeed, it worked! Well, it seemed to. Firewalls prevented me from viewing the video stream in a browser, and I sincerely wanted to avoid bothering IT again. Instead, I hooked the device back up to my monitor and viewed the stream there. Sure enough, it successfully determined that Happy is a dog and not a cat. Well done, Amazon!
While this milestone felt momentous, it was hardly what I set out to do. My main goal was to train a custom model and deploy it to the camera. And I had the perfect idea. You see, our office layout was reshuffled several months ago, and my two derpy Bernese Mountain Dog friends were moved to the floor below mine. I was, naturally, devastated. I coped by visiting their floor several times a day, hoping for a giant, fluffy hug, but was often met with emptiness. The DeepLens was clearly designed to solve my problem. I envisioned setting the camera up on the fourth floor, fixed on the Berners’ pens. I could configure the Lambda to email me when one of the dogs arrived, letting me know that it was time for a visit. My days of aimlessly wandering the floor would be over.
Step one was to train the new model. With limited training data, I opted to focus on detecting whether or not a photo containing a dog specifically contained a Bernese Mountain Dog. I used Amazon SageMaker to train their built-in image classification model, which was a mostly painless process. I then modified the Lambda written for the Cat and Dog project, changing a few lines of code so that it would use my custom model for the inference step. I deployed the new project to the device and hoped for the best.
Unsurprisingly, it didn’t work. Though the console claimed the project was successfully deployed, the inference Lambda produced no output. Nothing in tech functions perfectly the first time around, and I was ready to start debugging. But this is where I hit another snag. While the instructions were clear and straightforward, they provided very little guidance when it came to troubleshooting. I tried several different approaches, including manually loading the model onto the device and switching out the Lambda functions, but the result was always the same: The camera light was off, and my Lambda produced no results.
Eventually, I found the Lambda logs. Their output combined with the knowledge I gained in my troubleshooting efforts clarified the problem immediately: The model components were improperly named. Despite using AWS SageMaker to train and import the model, the name I gave it at import was not applied in the deploy step. I had actually discovered this already, during my earlier troubleshooting attempts. Rather than being prefixed with the name I provided, the files were named after the model type: ‘image-classification.’ Since the Lambda loads the model by name, this was an obvious problem. However, aligning the model name on the device and in the Lambda didn’t solve the problem either. The logs showed that not only were the filename prefixes wrong, but the suffixes as well. After deploying the model from the console, I had to log into the device and manually rename the model components so that they matched what the model optimizer function expected. After that small, obscure change, everything worked.
Unfortunately, my model was terrible. This first pass was meant to be a proof of concept, as I spent less than an hour gathering and preparing the data for training. When Fondue the Bernese Mountain Dog came upstairs to visit, the DeepLens had no idea she was there. In fact, everything was definitively identified as Not Fondue. In a glorious, ideal world in which my first attempt took less than a day, I would be eager to hunker down and more carefully train my fluffy dog detection model. But now? I’m exhausted. I think I’ll stick to improving our machine learning platform, with the occasional jaunt downstairs to search for the derps.