TezLab migration to Kubernetes

Ed. Note: TezLab is a companion platform on iOS and Android for Tesla owners to better understand their battery behavior, past trips, and comparative performance.

Hitting the Limits of Heroku

When HappyFunCorp started work on TezLab many years ago, we started with deployments on Heroku. Heroku lets you focus on your app more and less on devops, deployments, and all the mess that comes with that. With Heroku deployments are as simple as submitting new code and other operations like scaling or adding add-on services can be as simple as pushing a button.

In case you don’t know, the TezLab stack is mobile apps on Android and iOS (via React Native), an in-car dashboard, etc and a robust Ruby on Rails backend to service it all. To get an accurate view of the fleet, we collect data both from the cars themselves but also local weather conditions for each car to track effects of temperature and barometric pressure (and elevation, which the car doesn’t report) on driving performance. Combine this with looking at supercharger availability and performance we collect and process over 10 million data points of raw data everyday. From there we kick off our analysis functions in real time to give users the most accurate information in a digestible form as soon as possible.

Heroku scales pretty well, but your options are limited. They have just a few types of machines, and you need to use the same class of machine for all of your work. Their “auto scaling” is pretty brain-dead in the algorithm it uses and there’s little transparency on how or why it adds more dynos to your system. With Heroku they limit you to a max of 10 dynos and if you start trying to scale portions of your app independently you’ll run out of options quickly. For TezLab, as the traffic picked up dramatically we needed to scale up our API servers without scaling up our Web servers. On Heroku, that wasn’t really an option.

The other downside with Heroku is cost — all that convenience comes at a price. Entry level projects can be free or very cheap. But as soon as you start doing anything real with actual users you’re going to want to bump up to service levels on your dynos, databases, etc that start to get pretty expensive, especially if you compare the costs of those services to the underlying costs on AWS.

With Tezlab, we got to the point where we need more performance, more control over the performance and a better cost profile. Around the same time we ran into the fun fact that Tesla was blocking API calls to their services from AWS so we also needed a better solution than the odd proxy we hacked together to work around that problem.

So we knew we wanted:

  • Improved costs
  • Better control
  • Independent scalability of the different aspects of our application
  • Isolation between the different areas of our application
  • No limits on scale, performance, etc.

Making the Move

The above needs begged for containerization of the core components of our application and a solid place to run them. For containerization, Docker remains the king. For running containers, there are many options including Mesosphere DC/OS, Digital Ocean, Amazon ECS, and others. But there are also a number of options based on Kubernetes (k8s for short), and k8s has become the de-facto standard in container orchestration. Amazon was late to the k8s game, and we had our Tesla issues with AWS anyway. Google was an early adopter of k8s and was mature. We had a number of options, but with Google’s pricing, scale, reputation, and the fact that we used a number of services from Google already, the decision to move to Google Kubernetes Engine (GKE) was a pretty easy one. Now came the hard work.

The first step was turning the various bits of our application into containers. Containers for the API service, the Web Service, the Workers, etc. There’s a bit of an art to building containers to ensure your containers build well and fast. The initial construction can be slow and tedious, but in the end you have something that is isolated, builds quickly, and has only what it needs to do the job. To make things easier on the build (and later deploy) side, we used Capistrano with nice add-ons like Slackistrano to notify everyone when builds are done, etc.

Another issue we faced was that we used a number of third party offerings on Heroku including New Relic and others. We also had a lot of data in Heroku on a Postgres database that was costing us a lot of money to run.

We moved the database first. There are non-disruptive ways to do cross cloud provider movements like that, but we’ll admit we made the call and decided for a short downtime to make the cutover less painful for us. We stood up a Google Cloud SQL Postgres database, disabled writes on the Heroku one, did the copy and then switched our Heroku app to point at the copied database in Google. This reduced cost quickly, but now we had a cross-cloud performance problem. It’s not as bad as you may think as there’s a lot of traffic crossing between AWS and Google with very low latency, but it still isn’t ideal.

Another challenge we had was around secrets management. Heroku has a pretty open/simple approach to secrets that many manage just in their UI. We needed many of the same secrets over on the k8s side. We solved that problem by writing a quick conversion script that fetched the Heroku secrets and created k8s secrets from them at build time until we were ready to cut that cord. Now we were ready to start getting some code running in Google.

Google Kubernetes Engine is pretty easy to set up following their quickstart . It takes a bit of time to get used to all the commands and terminology, but most of it is pretty easy to grasp. The hard part comes in creating the Yaml config files for your services. We also had a heck of a time in configuring the various firewall/ingress rules to allow a broad range of iOS and Android devices in to our containers. After getting our services created and deployed to our new k8s cluster, we had the database, services and config in k8s, but a few left over third party services provided by Heroku. We slowly cut ties with them over time as we found better options (like Stackdriver logging).

Conclusion

TezLab has been running in Google Kubernetes Engine since the middle of 2018 and we’ve enjoyed the flexibility it provides to rebalance load, add performance, etc all while optimizing the amount of money we spend on the whole project. The jump from Heroku to k8s was a little late in coming for us, but was a great move for TezLab and our users.

我来评几句
登录后评论

已发表评论数()

相关站点

热门文章