Data Science to minimize cash burn in startups with Smart Marketing

Photo by Campaign Creators on Unsplash

Cycles of booms and busts are not uncommon in the world of business — it is the ripple effects of the peaks that cause troughs and those that survive the trough rise up to build the next peak. After a booming last year in the startup ecosystem, this year may be a reversal.

“We haven’t seen a slowdown like this in at least five to six years. It is going to be brutal,” said Anand Lunia of venture capital firm India Quotient, an investor in more than 70 startups since 2012.

A majority of startups cash burn extensively in marketing to meet the high growth expectation they quoted to the investors while raising funds. Investors now have begun to ask high-growth companies to go back to basics — chase profits and reduce their cash burn.

In times like such, having a strong data person who can build a setup of a smart marketing system is very crucial. A data scientist can help the company minimize cash burn (probably save millions of dollars) with minimum impact on growth numbers.

1Liner: To identify the high propensity users to target for a campaign. Secondary goal: maximize engagement and minimize cost of targeting

Marketing and discounting on items happen everywhere be it a food delivery company offering pro subscription; a ride-hailing company giving you monthly ride passes, or a fintech company targeting their customers by offering credit or loans.

Targeting Users happen through mainly four channels: Call, Email (or LinkedIn Inmail), Whatsapp, and Push Notifications

Connecting with Users (Image by Author)

In general, the above placement of cost and impact is true but cash burn through different channels and conversion impact may vary for some businesses.

Marketing Cost in many startups >>> Employee Cost, now you know where optimisation can happen XD

Let’s consider LinkedIn users for our campaign targeting, I shall share how we can implement smart marketing on LinkedIn Inmail.

Scrap and get User Data on LinkedIn (below image):

User Data (Image by Author)

Label encoding last two fields:

job_level is extracted from the job title and represents the seniority of the user. Value range between 1 and 7 (VP, Director, CTO, Junior, Senior)
job_domain is extracted from job title and domain (i.e., engineering, marketing, human resources, etc.). Value range 1 to 20

Feature Engineering on User Data(for additional features)

[1] Gender Prediction from Name: Gender Inference using Statistical Name Characteristics in Twitter (Paper Highlight Pointers)

Number of syllables: Female names tend to have more syllables than their male counterparts.
Number of consonants: Male names tend to contain more consonants than female names.
Number of vowels: Female first names tend to contain more vowels than male names.
Vowel brightness: Female names contain more brightly emphasized vowels than male names.
Ending character: Female names end more often with a vowel while male names tend to end with a consonant.

[2] Extract Email Domain: [email protected] -> xyz is the company name then identify company category: Fintech, Adtech, Healthcare, Commerce, Entertainment, etc. ( I know fillrate would be less as many customers may enter personal Gmail so ignore Gmail cases)

[3] Job Title Standardize and Encoding: Software Engineer, Data Scientist, DevOps, Machine Learning Engineer, etc. standardize these job titles with some data cleaning and apply categorical data encoding technique: Label Encoding (if using the feature in a tree-based model) or One Hot Encode.

Campaign Data (below image):

Campaign Data (Image by Author)

Feature Engineering on Campaign Data

The campaign’s more information can be extracted using zeroshot-classification over a fixed set of labels. It can also help us get a 100% fill rate in campaign_domain with decent accuracy.

Now we have all the User and Campaign features in place. We run the campaign over a fixed set of users randomly and capture their engagement response in a table where engagement_label 1 means the user showed interest and 0 means the user did not show interest.

Schema of Engagement_Table:

User_Id
Campaign_Id
Engagement_Label: 1 or 0
Reachout_datetime (when was user-targeted/contacted)

Training

Build a model with the dependent variable post feature engineering on the target variable as engagement_label.

This is a binary classification -> imbalance problem i.e. classification data set with skewed class proportions.

MODEL OUTPUT -> Given a <User, Campaign> pair predict if the user will engage or not (where 1= user will engage and 0 = user will not engage)

Training Model Metrics

For imbalance binary classification, campaign targeting problem for business perspective recall > precision. So use F-beta (recall>precision i.e. use F2 score) and AUC-ROC as metrics.

Business Metric is improving: Number of people Engaged / Number of people contacted (per campaign).

Inference

Once the model is trained for every campaign finds the top k, high propensity users. Stack rank users based on the model probability score.

Model Design Pipeline (Image by Author)

This model helps us stack-rank list of users for every campaign and helps us to target top K high propensity engaging users. While there is more scope to this model one may add lag features: past campaign category engagement average, user lag feature, behavioral features of a user towards one particular campaign type, etc. There is always so much more to it than we can cover but do spend some time racking your Brain on it..!

I hope you learned something new from this post. If you liked it, hit 👏, subscribe to my blog, and share this with others. Stay tuned for the next one!

Connect, Follow or Endorse me on LinkedIn if you found this read useful. To learn more about me visit: Here

Not a Medium member yet? Please use this link to become a member because, at no extra cost for you, I earn a small commission for referring you.

I also run a Newsletter Edition. If you are building an AI or a data product or service, you are invited to become a sponsor of one of the future newsletter issues. Feel free to reach out to [email protected] for more details on sponsorships.

[1] Propensity Model for user purchase on a return visit in Ecommerce

[2] Mastering A/B Testing by understanding Pitfalls

[3] Data Science in Ride-Hailing at Ola, Uber, Rapido, etc.

Subscriber to Email Notification: HERE

Source: https://medium.datadriveninvestor.com/data-science-to-minimize-cash-burn-in-startups-with-smart-marketing-cb0d5356b7ee

Data Science to minimize cash burn in startups with Smart Marketing – DataDrivenInvestor