Discovering the Value of Your Data: An Interview with Shauna Ayers and Catherine Cruz Agosto

In this interview, two STAREAST speakers explain how organizations are discovering the value in their data. Catherine Cruz Agosto and Shauna Ayers define data profiling and its importance, delve into different strategies you can use, and discuss how to get the most out of your data.

Jennifer Bonine: All right. We are back with more virtual interviews at the conference. I am so excited I have Catherine and Shauna with me. Thanks for joining me.

Shauna Ayers: Thank you.

Catherine Cruz Agosto: Glad to be here.

Jennifer Bonine: Ladies, and a really interesting subject that we want to talk about with these guys will be around data profiling and data quality, which I know lots of people are wondering about and asking about; but before we get into kind of the crux of the topic, why don't you tell us, Catherine, a little bit about your background and the organization you work for?

Catherine Cruz Agosto: Okay. As far as my background, I have a bachelor's in software engineering from Embry-Riddle, and I recently got a master's from Penn State in systems engineering. I work at Availity with Shauna, where both of us are data quality assurance analysts. We get into the data. Data's pretty much our lives.

Jennifer Bonine: Data's your thing.

Catherine Cruz Agosto: Yes.

Jennifer Bonine: Shauna, how about you, your background?

Shauna Ayers: I have an unusual background. I came to tech from a humanities side. My degree's actually in English, but I immersed myself in data, particularly humanities' approach to data. The fact that there's a lot of art to the science, and the stories behind it. I've worked at Fortune 500 companies, Johnson & Johnson's. I've worked at small shops, mom and pop, and the wide variety, and you always learned something new. The passion for detail and uncovering the puzzle is one of the biggest drives. Once you get into the data of a company, that's the lifeblood. Everything ...

Jennifer Bonine: Data tells a story.

Shauna Ayers: Oh, yeah. Everything that goes on in a company, there are stories in the data the data will tell. You may be able to mark up the presentation of data, but the truth will always exist in there, and you can see so much, and you learn about every aspect of the business when you're dealing with the data of it.

Jennifer Bonine: Yeah. That's-

Catherine Cruz Agosto: That's why data profiling's so important, too, because it gives you a different perspective of the data, and you may uncover things that may not have been obvious.

Jennifer Bonine: For people out there that aren't familiar with the concept, we all know you need data and testing, you need good test data. You need to make sure the data has integrity, that it's quality data that you're utilizing, but ... For those that maybe aren't familiar with the concept of data profiling, because you mentioned that, can you guys just talk a little bit or expand a little bit on what date of profiling is? For those that maybe haven't gotten into that or been around that concept.

Shauna Ayers: You want at it first, or me?

Catherine Cruz Agosto: Either way.

Shauna Ayers: One of the things about data profiling is, the only reason you're going to store data is, you want to use it again at some point. Without understanding what you have, you really do not have the ability to use it to its fullest extent and to make it work for you. Any organization, business, you name it, the date is the material. You have applications. You have mechanisms. Those are systems. You've got the process, the actions that go on. You've got those two legs of the stool. The third leg is the material you work on in software companies and in, more and more, every day every company. Data's that third leg. It's the material that you're working on in everything you do, so if you want to be able to make it work for you and leverage it as a resource, and protect your company from the variety and unexpected, you have to have a grip on your data.

Catherine Cruz Agosto: Yeah, definitely, understanding the data, not just what pieces there are, what particular features feed into what, but the relationships between them.

Shauna Ayers: The ... Yeah.

Catherine Cruz Agosto: Yeah.

Jennifer Bonine: How all the data fits together and understanding that, and have a good understanding. Now, you said you're a data analyst, right, in your organization. Tell us for people out there who have heard a lot of terms like test analyst or systems analyst, what does it look like when you're a data analyst, and what types of activities are you guys doing on a regular basis, so people know that maybe go, "Wow, we don't have data analysts at my company. We maybe need them"?

Shauna Ayers: Titles get vague. When you're dealing with data management of any stripe, you'll see there's a lot of overlap. We are quality analysts. At the same time, we have a specialized focus. Transformation applications, ETL. Just like a front-end UI application, you have logic, you have code, you have input, you have output. You need to test that what's coming out is what's supposed to. A lot of what we do is that, but ... With data, because any living business continues to grow its data, there's an operational component as well. Because you are continuing to deal with this moving, living, thing that can have, especially in very diverse companies with a lot of different input points or a lot of integration, especially with third parties ...

Jennifer Bonine: Yeah.

Shauna Ayers: ... there can come, within the input, risks that will cause an entire system to go down.

Jennifer Bonine: Right.

Shauna Ayers: You need to have continuous eyes on things. It's not a one-time activity. You need to be able to keep aware of what's coming into your system that could affect how everything's running.

Catherine Cruz Agosto: Yeah.

Jennifer Bonine: Interesting point that you kind of bring up there is: A lot of people worry about data or think a lot about data, when they're doing a transfer of data, right ...

Shauna Ayers: Mm-hmm (affirmative).

Jennifer Bonine: ... an ETL, an extract, transform, load process, or when they're doing a massive migration of data from one system to another; but I think maybe a misconception, it sounds like, is just like sometimes we hear with automation, "You just put it in place and let it run and it just goes, right, but we don't need to do anything with it." Same thing with data, right?

Shauna Ayers: Mm-hmm (affirmative).

Jennifer Bonine: You can't just do it once and go, "Oh, we're good. The data's in." Right?

Shauna Ayers: Yeah.

Jennifer Bonine: It's that maintenance and continually looking at it and ensuring that you're not causing major issues in your systems.

Shauna Ayers: As more and more companies are moving toward DevOps, it's right hand in hand. Any activity that goes on is recorded in the data, and that data is an input to other processes, particularly analytics and decision-making. Kind of a joke we came up with, for analytics, is, "You can't just put a chicken in a blender and expect to get chicken salad out of it."

Jennifer Bonine: Exactly.

Shauna Ayers: You got to know what you have, and thus data profiling. You have to understand the relationships, the flow, and what are the properties of the data you have, because each one of those drives something. With increasing automation, that automation is often driven by the data values. One unknown, and your system goes, "I have no idea what to do that," and it's gone.

Jennifer Bonine: Exactly.

Catherine Cruz Agosto: Yup. Yeah.

Jennifer Bonine: So true, and we see that a lot with automation, where people get stuck is, "How do I get the data in, and am I pulling in the right data consistently to be able to test something and make sure it continues to run?" Now, you guys did a session here at the conference on this, and maybe you can give just kind of a high level of some of the key points for people of what you talked about that didn't have; obviously, our virtual audience, to get a chance to see that session; so maybe some high points of what you guys talk about in your session that you give around this topic.

Catherine Cruz Agosto: Okay. Yeah, basically, we go into general, what is data profiling, like we've discussed before. Also, you don't necessarily need a tool to profile. You can use what you have at hand, or even create your own tools, kind of different classifications for profiling is a little bit what we go into, doing the different data sets, defining tolerances, what have you.

Shauna Ayers: A lot more of the "how." One of the things we've noticed, in starting last year with a data quality session we had given last year, we got a lot of good and useful input from STAREAST last year; but a lot of questions were on profiling, so we aimed for profiling this time around.

Catherine Cruz Agosto: Yeah.

Jennifer Bonine: This year.

Catherine Cruz Agosto: Yeah.

Shauna Ayers: There's a lot, there was nothing at all just a few years ago, practically, for data and data quality. Now, there's more out there, but a lot of it's at the white paper level. A lot of this, "Yes, this is a good thing to do. You need to do this," but nothing on how; so one of the things we're wanting to do is, as we build community, and STAREAST, TechWell, community, all of this is prime, prime opportunity. We want to build more of the resources and communications so that people get a better understanding of how, not just what needs to be done. Because inheriting a, "Here, go profile this." "Now what?"

Jennifer Bonine: Right. "What do I do with that?" Yeah.

Shauna Ayers: Exactly.

Jennifer Bonine: Exactly, it's like, "Oh, I've got awareness. I know I need to do something, but then what do I do with that?"

Shauna Ayers: Exactly.

Jennifer Bonine: For folks out there who are saying, "Wow, this is my awareness," because for some people it's, they're just getting that awareness of, "I get this is a good thing. How do I do it?" I know you mentioned there isn't a lot out there, but are there any particular books, blogs, resources, things like that where people are trying to get the "how," that you would recommend or suggest?

Shauna Ayers: Consistently, it's currently kind of scattered in odd places. There's, The Data Warehouse Institute has a lot of information because they've had to. They were some of the first ones to have resources on the topic that were usable and useful. They have a website that has a good deal of material. Some of the big data aggregators have been having more and more articles. It's more a matter of finding what aspect you're looking for, and it's not a centralized venue yet.

Jennifer Bonine: No.

Shauna Ayers: There's still a lot of opportunity, and especially if we want to build some forums and communities. I see that some of the things that have come up on TechWell we've started to actually leverage that opportunity to get people in touch with each other. We've spoken to folks during the one-on-ones, and we're trying to kind of share knowledge and build that knowledge base that doesn't already exist.

Jennifer Bonine: Yeah, so interesting that not kind of there yet, but there's some places you kind of can poke around and start.

Shauna Ayers: Oh, yeah, and people are asking the right questions now.

Jennifer Bonine: Yeah. That's good.

Shauna Ayers: They're finding out ... Sometimes they'll find out the hard way, but they're finding out more and more that you cannot have the mechanism in isolation from what it is working on.

Catherine Cruz Agosto: Yeah.

Jennifer Bonine: Yeah, and what about when we talk about data, another big component that we hear about now is security and securing your data?

Shauna Ayers: Oh, yeah.

Jennifer Bonine: How does that play into what you guys do around securing and making sure that your data isn't compromised or breached by someone else?

Shauna Ayers: There is overlap. No matter what security mechanism you are employing, understanding what you're trying to secure is needed in order for you to be able to aim it right, basically. We have done security checks from the app level, app-enforced, all the way down to individual data ... The actual, at the data level. You're only allowed to see the records that you're supposed to. There's data masking questions we've been receiving a lot lately, test data generation. We work in healthcare. Patient information ...

Jennifer Bonine: PHI. Yeah.

Shauna Ayers: ... HIPPA regulation. You'll find financial companies have the same problems. A lot of industries have to find a way of securing, and especially when they're dealing with integration with outside bodies, that creates new challenges. You can't just grab some data from production, bring it down to a test or a QA environment and run with it. You have to somehow generate something that gives you the accurate representation, something you can use to really give it a good test, without exposing everything. That takes a lot of planning and awareness.

Catherine Cruz Agosto: Understanding of the data itself.

Jennifer Bonine: Yeah. Absolutely. I think you guys have given us some good things to think about. The time goes so quickly. I'm sure people have a lot more questions for you two. What is the best way, if they have more questions, and we've kind of piqued their interest on this topic, for them to find you guys or get in touch with you?

Shauna Ayers: We've got email. We actually have a little flyer we're handing out at the speaking, and that has a little bit of a, we're going to use Google to collaborate a little bit.

Catherine Cruz Agosto: Yeah, if anybody has specific questions, yeah.

Shauna Ayers: We've set up a space to share examples.

Jennifer Bonine: Okay, so—

Shauna Ayers: Then, the STAREAST has its own, has the actual materials from the presentation there, and we should have contact information through our profiles there too, I think, don't we?

Jennifer Bonine: Okay.

Catherine Cruz Agosto: I think so, yeah.

Shauna Ayers: Yeah, we should.

Jennifer Bonine: Hopefully, so your contact information will be out, if they go to the TechWell website, where they can get your presentation on data profiling and data quality, and then your contact information will be out there as well ...

Shauna Ayers: Yes.

Jennifer Bonine: ... so they can find you.

Shauna Ayers: We encourage questions to come in, because the more questions we get, the more we understand what people want to know first, and we can start getting that ball rolling for getting the information out there for folks. Because it's a really broad topic, a really deep topic, and there just needs to be a lot more discussion on it.

Jennifer Bonine: Yeah. Well, hopefully you all enjoyed getting a chance to hear from you guys, and we've piqued your interest on data profiling and data quality, and tune in for our next interview. Thanks, everyone.

Shauna Ayers: Thank you.

Jennifer Bonine: Thanks, you guys, for joining me.

Shauna Ayers: Thank you.

Catherine Cruz Agosto: Thank you.

Catherine C.Catherine Cruz Agosto found that her software engineering experience at Baxter Healthcare and Boeing-subsidiary Insitu provided an excellent foundation for finding more effective and user-friendly approaches to complex technical problems. Catherine has developed more efficient and innovative data quality testing solutions at healthcare intermediary Availity, expanding their automated data quality testing processes to accommodate diverse and dissimilar data sources, facilitating analysis, testing, and controls for data integration, analytics, and healthcare data reporting.





Shauna A.Shauna Ayers has been untangling the Gordian knots of IT systems for more than seventeen years, analyzing data systems and testing both software and data quality in the manufacturing, medical device, and healthcare industries. Shauna found her passion in developing creative solutions for the analysis and testing of sensitive and highly-regulated data sets at industry leaders such as Blue Cross Blue Shield of Florida (now Florida Blue), Vistakon (a subsidiary of Johnson & Johnson), and Availity.

About the author

Upcoming Events

Jan 30
Apr 29
Jun 03
Jun 25