Top 7 Ways How Data Scientists Can Stop Wasting Their Time Processing and Cleansing Bad Data!

Are you a accomplished data scientists who often spends much of your time processing, cleansing and organizing data?

Do you ever feel like this is a waste of your time and poor use of your talents?

Unless your company has a dedicated data integration team building a data warehouse for your use, you likely spend a fair chunk of your day wrangling and cleaning up data or learning a new complex source system. In fact, Crowdflower claims that data scientists spend 80% of their time preparing data instead of modeling.

As a data scientist, you’re arguably the most valuable and highly paid member of your data team. How happy would your boss’ boss be to learn that this investment in you is being wasted on these more nominal tasks?

I’m a 20+ year data professional in data management and analytics and 10 years in leadership development, communication and culture change. The purpose of this article is supporting advanced analytic teams getting better answers to more questions, quickly.

I found three reasons this problem may exists, some of which are in your control:

Lack of Awareness: Your management doesn’t realize how much time you actually spend doing data preparation and cleanup tasks that could be economically offloaded to another team member.

It may be awkward to speak openly with your boss “confessing” that you only spend 20% of your time doing what you were hired for – predictive modeling . Yet the data quality issues are real and have to be addressed for your models to work. Your company has a choice: Handle the data quality and governance issue at the organizational level so you can use your time doing effective work, or keep doing what they are doing and lose your organization’s competitive edge while you move through the cleanup process at a fraction of the speed and increased cost.

You Don’t Have a Data Driven Culture: Your company’s culture doesn’t yet value decision-making based on advanced analytics.

Unless your company is a startup, you likely have data integration and analysis staff working for your company. The problem is that data science is considered a “side” show, rather than the “main” show. When you educate your management regarding the capabilities of predictive data models in your industry and the success your existing models have produced, the culture will shift toward data-driven.

Company Politics: A variety of organizational impediments, constraints or dysfunction can result in budgets or resources being allocated to lower priority projects instead of high value projects like yours. In most cases, your direct manager hasn’t made a compelling case to allocate the ideal data resources such as a data architect, ETL developer, or BI analyst to support your project. Nothing against your boss since we all struggle sometimes sticking our neck out when dealing with politics or when uncertain how to win them over. You can help them make this compelling case.

Nothing frustrates me more than time and energy being unnecessarily wasted and great talent being under utilized. So what can you do to eliminate these problems so that you can focus on what you love and are awesome at rather than tasks you don’t enjoy and are likely not even trained to handle?

Top 7 things your company can do today so you can focus on predictive modeling rather than data cleanup:

1. Re-allocate another person in your organization to your analytics project.

Rather than spending a majority of your time processing and preparing data, you could get support from someone in your company who already has knowledge of the systems and data. Depending on the need, this may require that person(s) be allocated full-time, or just part-time to you project. A good BI analyst or report writer could prepare the data and queries in a format that’s ready for your model to consume.

To get the resource reallocation, you will likely need to convince your bosses that your modeling efforts will produce a higher value than other projects. Senior management must deliver specific key results which your project can help them deliver. Your focus should be on the value of creating 2-3 times the throughput for each data scientist in the organization.

2. Hire a temporary contractor to handle your data processing and prep issues.

If you can’t get an internal resource, could your manager hire a temporary or part-time contractor to offload part of the data quality work? Look for a “data engineer” that has experience with the source system you are using for your analytics project. This reduces weeks or months of the person learning the nuance of tat source system, thus accelerating the speed with which you can get good data to begin training your predictive model.

Better yet, if this data professional has data modeling and ETL type experience, you can specify the data structure you need and they can aggregate the data and deliver in that format.

3. Hire a full-time data engineer if you’re analytics team has more than one data scientist.

If you have two or more data scientists on your team, together you are likely spending at least 1 FTE doing data processing and preparation. Thus it makes sense to hire a full time data architect, ETL developer or BI analyst to support.

If you and your fellow data scientist(s) are each doing 80% non modeling tasks, that is the equivalent of 1.6 FTE doing tasks that are more appropriate for a data professional specialized in data quality and warehousing issues. You’ll never eliminate all non-modeling tasks, and neither am I suggesting an ideal allocation for modeling vs non-modeling tasks (feel free to comment below if you have a thought), though certainly this little adjustments in the team can make a big impact.

Once your management recognizes the extensive cost and inefficiency of having the data science team handle the data quality and access issues, they will likely support your plan.

4. Shift to a Data Driven Culture by aligning the predictive modeling efforts with key results and objectives of the business.

C-suite is concerned with key results such as increasing revenue, reducing costs, eliminating waste and reducing risks. If the senior management understand which advanced analytics measures and models map to their key results and begin to understand how those metrics and prediction will help them achieve their results, they will further allocate resources in your favor.

There may be an organizational mistrust for predictive analytics if they are more familiar with making decisions by the gut. Since it may require a little persuasion initially, show them outside case studies for your industry if you haven’t been able to implement the solution yet.

5. Skip the data warehouse – use the source system rather than wait for the data to be brought into the data warehouse.

This may offend many of my peers, however, if the goals is to get the predictive models working quickly, it may be more effective to use the source system initially. Moving the required source system data into a data warehouse likely takes 3-6 months or more to properly build the data model and all the ETL processes to load it. If the business can’t wait that long to get the model working, start with the raw data from the source system, then convert the data source later on when the data warehouse is ready.

6. Purchase and implement an industry data model so data scientists don’t have to re-learn each new data source they may need to include.

Imagine what it would be like to never have to learn another source system again. Industry models from IBM, Teradata and others, provide predefined data structures and analytic capabilities built in, which is an advantage over custom built data warehouses.

Though there is an initial overhead doing the data integration into the data warehouse, the responsibility of a data integration falls on a specialized team or vendor to deal with data processing issues and loading the data. If you can afford the initial time and costs to build the data warehouse, you can reap the long term rewards of having a single place for doing predictive modeling. That frees you up to focus on the analytics, not the data messes.

7. If all else fails, find a new company.

Not every company is ready to appreciate the capabilities and experience you bring to the table. Organizational culture changes can sometimes take years. Be honest with yourself about how ready and willing your organization is to make the needed changes. You owe it to yourself and your career to put yourself in the best environment possible where you can reach your highest potential and help your company do the same.

Summary

It’s in your company’s best interest for you to be doing high value data science activities. If management doesn’t address the data quality and governance issue quickly, your company may lose its competitive advantage to other companies that act more quickly. Though fixing that problem is not your job, exposing the problem and proposing solutions is.

Your company can’t afford to have you doing non-analytic work, when there are likely dozens of qualified people in your company who could help.

Share the Post: