The Sweet Spot
Suppose I give you a simple problem statement: Predict the next number in the sequence
2, 4, 6, 8, 10, __?
Easy peasy right? I’m not going to insult your intelligence by telling you the answer. I mean, come on.
Now let’s take another sequence :
8, 13, 21, 34, 55, __?
Not so easy as the first one, is it? Well, if you’re still scratching your head, it’s a Fibonacci sequence, where the next number is the sum of the previous two numbers. And the next number is 89.
Now how about this sequence :
0, 2, 4, __?
You might be a bit confused now. Is the next number 6, where the sequence is just adding 2 to the previous number? Or is it 8, where the sequence is just powers of 2? Can you decide?
You can’t really, can you? But why can’t you make a decision? Suppose I add another number to the sequence :
0, 2, 4, 8, __?
Okay so now we know. It’s going to be 16, as we found out the pattern in the sequence was exponents of 2.
Why couldn’t we decide earlier? And why did adding one more instance instantly solve our dilemma? Well, to put it plainly, you got more data to work with, and then could extract a pattern.
This is precisely what Deep Learning models do. You feed it data and beg the wizards hiding inside the models to figure out patterns that make everything sensible. (Deep Learning is fun!)
Congratulations, you’re now a Data Scientist. Start brushing up that resume.
Why Go for Deep Learning?
One word. Accuracy.
When the problem statement is complex, there will be more exceptions than rules. This rules out a simple rule-based/algorithm-based system. And even when resorting to Machine Learning (ML), the patterns a Data Scientist would find, that would generate the feature set required for these models, would have exceptions that can be very hard to tackle.
Entering the world of Deep Learning and Neural Networks.
As mentioned earlier, Deep Learning (DL) models work on data (labeled data). And the model architecture you’ve designed, learns patterns from the data, all by itself. And it does this by constantly iterating over the training data, making predictions at each iteration, and finding out what the error between the prediction and the ground truth is. This error signal is then used to update the parts in the network that ‘learn’ ( which we call ‘weights’ of the neurons). This process is repeated until the error signal between the predicted truth and the ground truth is minimal.
Now you must be thinking where are the wizards that I mentioned earlier. Not to burst your bubble, but there are no wizards inside the neural net, it’s just math, particularly calculus. Sorry.
Neural nets and conversations – the ensemble way!
Now coming to conversations. conversational data are not something that you can find in loads, on the internet. Well, you can, but most of them would be either simulated, in the form of chat corpora (which is way different from how people speak), or they wouldn’t fit the use case that we’re trying to solve. And for a neural net, it always needs data.
We can’t just go back to standard ML and rule-based approaches. They would be as helpful as having a toothbrush when you’re tasked with cleaning an entire floor. I mean, you can do it, but at what cost?
And I don’t take hallucinogenic drugs, so I’m under no delusions that a single DL model trained on limited data can give us godly results.
This is where “ensembling” various DL models and other methods can be used for exponential payoffs. Each neural net would learn different things from the data it’s exposed to. Then it’s just a matter of combining these multiple “weak learners” (“weak” because they each learn specific things, but not the entire thing) to get the best possible model.
Now, even this mighty ensemble might fail to spot simple patterns, and so what do we do for that? Well, now since our floor is 95% clean thanks to the deep learning ensemble, we can finally make use of our toothbrush and apply the standard age-old techniques to weed out the rest of the dirt.
There now, all shiny isn’t it?
Sign up with Symbl to get started.