Whether it’s recent news or just new to you, every two weeks the Data Planet serves up fascinating insights and resources from the data analytics and BI world.
Our snack-size summaries skip straight to the point.
This week’s edition of the Data Planet includes:
- What’s Really Going on With Data? Results of Data Professionals Survey
- No Single Data Repository Can Be Your Silver Bullet
- What’s New in Dataflows? A Look at Reverse ETL
- Software Spotlight: Tidyverse
What’s Really Going on With Data? Results of Data Professionals Survey
Find out how data professionals at organizations across the globe responded to questions about data, governance, integration, and the challenges they face. The survey results are full of interesting insights and statistics, and it’s worth the download. Fivetran financed this survey, compensating respondents for participation, so keep that in mind.
Key takeaways:
- “Data-driven decisions enable the business, but many organizations are failing.”
- “Pipelines are slow to build and difficult to maintain, impacting the business.”
- “Data engineering needs better leadership.”
Get the Results: 2021 Data Engineers Global Survey
No Single Data Repository Can Be Your Silver Bullet
Even in simpler times, the 1990s, you couldn’t store all the data you needed for analytics in a single data warehouse. Data lakes didn’t solve the problem either. Imagine the challenge today, with a tsunami of data.
Data virtualization is a solution that takes a very different approach. Get a glimpse into the merits of a data virtualization layer in this short, but informative article.
See Why No Single Data Repository Can Be Your Silver Bullet
The Power of Denodo: An Inside Look at The Leading Data Virtualization Platform
What’s New in Dataflows? A Look at Reverse ETL
In ETL, the assumed direction of data is from source systems to the data warehouse or data lake. Reverse ETL is about taking data from your data warehouse and putting it back into operational systems.
Hundreds to thousands of hours are often required to clean and organize data in data warehouses. It makes sense to leverage that effort for more than reporting. But there are a couple problems with this approach.
First is latency. Most data warehouses are built with the assumption that data will be loaded in big batches. Depending on your business processes, this latency could be an issue. Second, data warehouses don’t work well as transactional systems. Warehouses are designed for fewer big queries, but transactional systems are for frequent small queries.
Reverse ETL is still a new concept that will probably take a few years for the market to sort out. Meanwhile, companies like hightouch are offering solutions, and we’re interested to see where this might go.
See the Connection: Modern Data Warehouses and Reverse ETL
Software Worth Sharing: Tidyverse
Tidyverse is a set of highly regarded R libraries. Their goal is to present a better user experience for R developers. Most seasoned R developers have probably used dplyr and ggplot2. But there are other interesting packages as well, including the following, which are described on the Tidyverse site.
- Ggplot2: Plots and Graphs for R
- Dplyr: Data manipulation
- Tidyr: Data cleansing
- Readr: Fast reading of CSV files
- Purrr: Ads some functional programming utilities
- Tibble: A reinvention of the standard R dataframe
- Stringr: Set of functions for working with string (character) data
- Forcats: Set of tools for dealing with factors.