My new project Vision AI - Nicolas Hoferer

A few weeks ago, I finally finished my master’s thesis. I decided to keep the momentum of working hard going and start developing a mobile app for image processing using machine learning methods.
Since I want to have a working app pretty quickly to see progress and stay motivated, I decided to use some existing models and build the app around them first, then integrate other models later and even optimize them.
So my plan is to first find a pre-trained model that makes Super Resolution, then deploy it somewhere, and then build an app that uses that model. Super Resolution means to upscale a low res image to a higher resolution.

In the long-term, the app should allow uses to improve their images. This means changing the style, increasing the resolution, improving the general appearance, and so on.

Sounds simple, right? But even finding a free cloud provider isn’t that easy.
ML is usually done on GPUs, but GPUs are expensive. The problem is that there are not many cloud providers that offer access to a GPU in a serverless way. However, I don’t want to train the model in the cloud and use it only for inference; therefore, we do not need a GPU.

So I compared several cloud providers, specifically Google Cloud Platform (GCP), Microsoft Azure, and Amazon Web Services (AWS). Since GCP provides free credit, works well with Firebase, and I’m most familiar with it, I decided to go with GCP.
I use GCP cloud functions to run the backend. Cloud functions are serverless, so an instance is started when a request comes in, and the upscaling is performed. After some time, the instance is shut down to keep costs down. The downside is that starting the instance takes some time. However, we only pay when we need to upscale an image, which is critical to keeping costs down if we don’t have many customers.

Cloud functions can be accessed via REST and other methods. However, since it takes time to upscale an image, making a synchronous request from the mobile device doesn’t make sense.
Another option that cloud functions offer is to define a trigger. This allows us to invoke the cloud function whenever an object is created in the cloud storage. I have decided to use this option, so when a customer uploads an image to the cloud storage, the function is executed, and the result can also be stored in the cloud storage.
The user can then see all the completed upscaled images in the app. At the moment, the user is not notified when the image is upscaled. This is one thing that would make it more convenient for the user, but it is not necessary for the app’s main purpose, so I will add this feature later.

Since I’m using cloud services, at some point, I’ll have to pay for them, so I integrate ads into the app with AdMob. I also utilize in-app purchases to monetize the app. Specifically, each user has some free credits that they can use to upscale images up to a certain size. To upscale more images or larger images, the user has to buy Pro Tokens.
To store the number of credits and other user information, I also use Firebase.
For the backend running on GCP, I use Python because the model I use is in TensorFlow, and Python is probably the best language for use cases related to machine learning.
For the mobile app development, I use React-Native. Because it’s pretty easy to develop an app with it, and it also allows us to quickly release the app for iOS and Android.

Other tools I use are:
PyCharm for backend development
WebStorm for app development
BitBucket as a version control system
Jira for planning

Please note that this is just a brief description of the Vision AI app I am working on. While it contains some details, it is not detailed enough to be a tutorial. My goal with this post is to document the process and improve my writing skills in general.