Here’s some transparency into the technical decisions that have driven our technology. Also some insight into the things that our stack has made easy, and also some things our tech choices have made hard…
Hopefully, this gives you an idea of some of the relevant technologies in our field. From talking to fellow startup founders, this stack is pretty similar across a lot of other Machine Learning focused data teams, with some variations from industry and personal circumstance.
We use Python for basically everything. When something that isn’t serving efficiently or wasn’t built very well isn’t keeping up, we think about converting to C++ with Python serving merely as a reference implementation. We use Python most heavily for our modeling and ETL. We’re very much a data company so of course there’s SQL everywhere.
We like TensorFlow for its great documentation and high number of devs with TensorFlow familiarity. Though PyTorch is starting to make some real headways, especially with all the great work Facebook Research has done recently. For now, TensorFlow is the majority of our stack, but I see nothing wrong with TF and PyTorch mingling in the future.
Scikit-Learn pops up all over the place. Its ease of use is undeniable. It’s seen in production at companies all over industry. It’s really the bread and butter of much of what we do non-deep learning that we do on the ML side.
Quite frankly I’m not a frontend dev and so I won’t waste any of your time on this section. Our first hire liked Vue.js and so we went that direction originally. He thought React/Next.js made more since for another part of the codebase so that’s how that happened. We’re incredibly pleased with the work our devs have done. We love Next.js for its SEO friendliness.
Our stack lives on Azure, so SQL Server seemed to make the most sense up front. From a cost and ease of use perspective, the two are obviously tightly integrated. Other than the social pressure of “you’re not using MySQL or PostgreSQL???” it’s doing everything we need, for now. We’re eyeing a potential move to PostgreSQL.
We’re slowly moving into Databricks as the volume of data grows. I’m a huge Databricks fan, also a big Snowflake fan. The two partnering up is awesome. Databricks is spectacular and I expect to bring Databricks fully online very soon.
Until recently, I was almost exclusively using Python + SQL & SQL Alchemy to do most ETL tasks. Someone I know forced me to check out Airflow and all of a sudden it’s becoming part of our stack. Scheduling workflows feels a little more natural using Airflow than stringing together cron jobs.
We use GitHub. Shocker.
We use Notion and I find myself using Notion for more than just project management and tracking. I use it for personal accountability and really just a technical diary. I’m able to keep track of what I do on a daily basis, make sure I’m allocating time efficiently, and track what everyone on the team is up to and where I can help out or make someone’s life easier.
We love Streamlit; it helps us demo models and API endpoints in very little time. I dig Plotly and Altair at the moment for their ease of use on non-public projects. Plotly gives us a bunch of flexibility and features without much effort. Altair gives us more in-depth features and customization for extra effort.
For business metrics and keeping track of revenue, churn, GA, etc we love throwing stuff into Tableau. It’s an easy way for our non-technical folks to dig straight into the analytics.
You can check out our product here.
Let’s continue the conversation on Twitter!