UPDATED 15:30 EDT / FEBRUARY 25 2015

On cleaning data lakes and creating analytics for old nerds | #BigDataSV

IMG_0509The only way to scale is to let citizens collect data. That’s the guiding principle behind EMC‘s sustainability project, providing citizen scientists and environmental non-profits a collective platform for sharing and analyzing flora and fauna data. But the feel-good project doesn’t come without the real challenges of delivering analytics services at scale, as we learn from one EMC executive.

The goal to democratize data

 

In a live interview with SiliconANGLE’s roving news desk theCUBE, Distinguished Engineer John Cardante from EMC’s Corporate CTO Office speaks on his company’s collaboration with the National Parks Service, Earthwatch, and a variety of environmental non-profits to “bring tech to bear on something meaningful for the climate.”

The problem, he described, is that right now,” the data sets to dig into this are all separated.” EMC data lake technology allows scientists to collaborate much more easily, he added.

EMC data lake technology is particularly well suited to this type of task because it’s “an analytics environment,” Cardente said. For this particular project, he continued, the data lake is ideal because it allows the pool of data to scale. And the “only way to scale is to allow citizens to collect data,” he added.

Challenges: lack of access to tech, infrequent data updates

.

Cardente then shed some light on the challenges EMC faces with this project. Having participated in a one week expedition to Acadia National Park in October to bring the people and science together. While enlightening, the trip revealed some of the difficulties this project has to overcome: “A big proportion of citizen scientists are elderly, or don’t have access to technology,” Cardente explained, so EMC is “actively working to figure out how to accommodate those folks,” and “overcome the lack of technology.” Cardente remarked that EMC hopes to develop a system of alerts that will enable citizen scientists to observe significant environmental events, like bird migrations.

At present, the scale of the data set EMC is working with is mostly “one time dumps” from non-profits. EMC is “working with partners to make connectors and feeds so we can get incremental updates.” In that way, he hopes to “Act as a cleaning house and a hub” through which scientists and citizen scientists can share data.

When asked why EMC decided to lend its efforts to climate change, Cardente commented wryly: “100 percent of our customers live on this planet.” While EMC is “really excited to build something meaningful,” Cardente added that this project “is a great proving ground to try some technology.” It’s a win-win, he implied, saying “it gives us a good demo vehicle for our technology, but also [the chance] to give back to the community.”

Watch the full interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of BigDataSV 2015.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU