I have a seashell on my desk. It serves as a personal reminder to always think out of the box and widen my viewpoint when solving complex Big Data Analysis challenges.
In this article you learn:
- Why Data Scientists carry risks like fishermen?
- What is Chicken-Egg problem of Data Science?
- How does the Chinese fishing nets in India get lifted up?
Each time I look at the shell, I’m brought back to the day I found it; it was a fascinating, most intriguing art condensed to extract value from mass fishing with the least effort. This is pretty much where I found it:
Big Data Science in big oceans – the correlation
Procuring my very own shell was similar to uncovering, and appreciating, the value of Big Data Analysis; this shell origins from an ocean of artifacts, and the value of is an ocean of data.
Data scientists’ casts their own lift nets, just like these fishermen with their large nets, fish for information in the data oceans, and once finding a valuable artifact, they proudly present them to their fellow peers.
Perhaps the difference between these two activities are that the ‘oceans’ which these Data Scientists casts their nets in are massive; I’d personally use trawling nets for my endeavours.
How does lift net architecture relate to Big Data Analysis and Data Lakes?
Finding the correlation with the ocean was my a-ha moment for Big Data Analysis. I was near Cochin, India when I saw these lift nets. I was fascinated by it and looked and watched these fishermen worked with them.
While I studied how these immense fishing tool worked, a fisherman gestured me to come over and showed me how it works. He then gifted me my first shell.
In a (nut)shell, this lift net is an apparatus to assist in fishing, where a huge net is lowered into the water and then pulled up by a lever attached to a rope wound around a motorbike. Once lifted, the content of the net becomes visible to the fisherman. See this technology yourself in action:
This is not so different to a Big Data Analysis setup; we collect information out of various systems and make it accessible for Big Data Analysis. All information is collected, provided and automated via Big Data Analysis tools.
In order to visualise this better, I have made the following picture. Here, you see the various data sources which are integrated to do Big Data Analysis .
The data is then structed, cleaned and stored into models, enlisted in a repository and flagged as master or transactional data for analysis. Data Scientists then use these information to examine and investigate thoroughly to make their catch worthwhile.
From this we can make the following conclusions:
- Using the right tool is essential to fish in the Big Data Analysis for
- Using automated processes and tools enables Data Science talents to mine and do Big Data Analysis easily
How data Scientists extract value from Big Data tools
In Data Science, we have a full of velocity, variety and massive humongous of information that one needs to prepare in to accessibility to mine facts from it.
Experienced Data Architects and experts will also determine the positions to install the right tools to extract and integrate the most amount of information. Subsequently, Data Engineers then construct Big Data Analysis tools like the lift net to integrate and access the information stored in the ocean.
Despite building these infrastructures with a small pool of talent due to shortage, Data Engineers construct automated processes to extract, monitor, mesh and model data integration and provision the necessary tooling to investigate the data.
At most times the information is inadequate, and some additional information needs to be extracted. Once this picture is complete, it would be enough for the catch to be presented to influence business decisions.
We can take from this the following lesson:
- Ensure your tooling is reliable and already in production
- Establish standard procedures how value should be extracted from the
The main challenge and the chicken-and-egg problem of Data Science
Like a fisherman, the Data Scientist and also the Data Engineer, together with the Data Architect would not know if they will find value in the . All the Data Engineers tooling for Big Data and the explorative work of the Data Scientist is just done on an assumption that they likely will find value. This makes their mutual trust with each other an essential value to share in this line of job.
Furthermore, this uncertainty makes it hard to start Data Scientist projects and scale them; first, there needs to be evidence that there is value to get full commitment, sponsorship and unswerving support from business departments. Without evidence and full management commitment, it is hard to get sponsors and support.
This ultimately is a chicken-and-egg problem. As it is for the sea, the problem of the fisherman there is solved centuries ago: the fisherman carries the risk of having a catch or not. Hence, the fisherman takes care of his work equipment very carefully and train themselves to utilise their tools perfectly.
Ultimately, there is no certainty for a catch in each execution. The ultimate goal of a project must be to provide the initial value to overcome the chicken-and-egg problem. This is the crucial first step to advance tooling, procure results and to scale.
In order to get a project running, we learn from the fishermen’s craft:
- To provide the first value in form of Big Data Analysis results as quickly as possible as it has a real impact on the sponsors
- Big Data Science Talent need to enhance their skills and tools continuously to ensure success quickly when a new project gets started
I believe that the analogy between the fishermen of Kerala and the Big Data Analysis is an interesting one. We have discussed several similarities and saw different challenges and obstacles Big Data Science teams need to overcome.
Thereby, we found that the tooling and processes of the fisherman can be inspiring for our Big Data Science projects. Here’s what our key learnings are:
- Automated Big Data Tools and processes are essential to run in production
- Providing first productive value as quickly as possible as it has a real impact to this ecosystem
- Big Data Science Talent need to enhance their skills and tools continuously to ensure a catch quickly, when a new project comes in
Find below the most importnat questions and answers of this article.
- What project risks do Big Data Scientists carry naturally?
The explorative work of the Data Scientist is just done on an assumption that they likely will find value. There is no certainty for a catch in each execution.
- What is Chicken-Egg problem of Data Science?
There needs to be evidence that there is value to get full commitment, sponsorship and unswerving support from business departments. Without evidence and full management commitment, it is hard to get sponsors and support. Data driven projects need results, but in order to generate this sponsorship is needed.
- How can Data Science project results overcome the chick-egg problem?
They need to provide the first value in form of Big Data Analysis results as quickly as possible as it has a real impact on the sponsors.
- What is the role of tooling in Data Science projects?
Automated Big Data Tools and defined processes are essential to run in production. This is necessary to esure testability and reliability.