World-class ML-based services to the real estate market
Esoft has an ambition to provide an extensive set of world-class ML-based services to the real estate market. These services range from image classification to image manipulation and text generation (Learn more). Machine learning can be split into two phases: 1) Training and 2) Inference.
Machine learning can be split into two phases: 1) Training and 2) Inference
Training is the act of learning from data and inference is the act of utilizing a trained model to produce a prediction.
In this post, we discuss how to optimize machine learning inference, which is extremely interesting, because it represents the bulk of monetary cost in production deployments. To convince you of that claim, consider that one typically utilizes costly hardware (GPUs, TPUs and such) to train models on very large datasets. This is expensive, but only needed periodically and for a relatively short amount of time. On the other hand, inference is needed for every prediction, typically with low latency and on large scale.
How to choose between the networks
To efficiently work with deep neural networks, one needs to utilize frameworks such as Keras, MxNet and/or PyTorch. It is a long and ongoing debate on how to choose between these frameworks and there are many factors to take into account when making a choice, such as popularity, maturity, availability of pretrained models, support and skills within the organization.
Figure from Kortlink
Training and inference used to be tightly coupled, which means you needed to do inference using the same framework as for training. This lead to unfavorable situations, as the training environment typically differs significantly from the production/live environment. Luckily, there are now techniques which allows decoupling of training and inference, which means we are free to use the best framework for training on a case by case basis without considering the deployment of inference. One of the most noteworthy efforts is the Open Neural Network eXchange (ONNX), which allows us to train a model in one framework and transfer to another framework for inference.
Figure from Xenonstack
Deploy inference using a GPU
A single prediction in a deep neural network can entail millions of floating point operations, which is computationally very expensive. As it turns out, DNN frameworks are typically designed for very efficient training and inference on costly hardware (such as GPUs), making performance somewhat disappointing on cheaper hardware (such as CPU only). A straightforward solution is to deploy inference using a GPU, and indeed that works very well, but it is an expensive approach, which can easily cost thousands of dollars per year per model in a public cloud.
A straightforward solution is to deploy inference using a GPU, and indeed that works very well, but it is an expensive approach, which can easily cost thousands of dollars per year per model in a public cloud.
If one can fully utilize the GPU, the cost is typically very reasonable, but often it is not possible, leading to over-provisioning of computation resources. In such cases, it is sometimes a good approach to assign a partial/shared GPU (e.g. AWS Elastic Inference), thus lowering cost.
So far, we have discussed using standard DNN frameworks with costly hardware to do inference in a timely manner. This works very well, but for some tasks we can actually do better. There are efforts to build (context-specific) inference engines. One example of such an engine is Intel OpenVINO, which specializes in executing Computer Vision models on Intel CPUs. The Intel OpenVINO toolkit consists of a model optimizer and an inference engine. The model optimizer takes many different types of models as input, and most importantly it supports the ONNX format. The output of the model optimizer is a new model which is then used by the inference engine. Using Intel OpenVINO allow us to get <100ms inference on very deep neural networks using cheap hardware (cpu only). The added benefit is that the model is extremely easy to manage and scale.
What’s in it for you as a customer of Esoft?
As mentioned in the beginning, Esoft has an ambition to provide an extensive set of world-class ML-based services to the real estate market. To put it briefly, we want to deliver quick answers to the right price. It is simple to create something that works, but it isn’t simple to create something that works optimally.
Why? To offer our customers the best and most effective solutions.
To put it briefly, we want to deliver quick answers to the right price. It is simple to create something that works, but it isn’t simple to create something that works optimally.
Fortunately, we have the best research and development team who works very hard to make sure that Esoft will stay on top as one of the world leading tech companies within real estate. Our mission is to make it easier to market real estate, and we will help and change the real estate industry through technology.
Always decouple training and inference, as the train and inference environments are usually very different. Once a deep neural network has been trained, and needs to be deployed for inference, there are two main ways to achieve low-latency inference. Either by utilizing costly hardware or by a dedicated inference engine. Often it is impossible to fully utilize an entire GPU for inference, in which case it can be beneficial to assign only a partial/shared GPU. Also, remember that inference engines are not equal. There are context-specific engines, which are highly optimized for specific tasks and hardware.
Thanks for reading!
Written by our Head of Research & Development, Kasper Grud Skat Madsen
. . .
Go follow Kasper at LinkedIn.