Data Sharing
Data Sharing is Data Caring ❤️
I have come to the conclusion that there are essentially two types of people when it comes to data. The people that have data, and the people that wish they had the data. Another thought is that I would rather have the data and not need it versus need the data and not have it. For almost the past two years I have had the privilege to work for a great data platform company. The amount of data we have access to increases our return on what we do in the IT/Ops space exponentially. It also helps drive collaboration between teams, allows for centralized data sharing, and enables everyone to make data driven decisions.
In a previous life I did have some opportunity to work with various data tools, and use data to help me accomplish my goals. However, I have never had data like I do now. When I worked in vendor space a lot of my exposure to data tools was around what our customer’s wanted. Many of the customers I engaged with wanted to get all their data from my employer’s product. This typically resulted in large projects, and many labor hours to accomplish. There were also almost always caveats with most of the tools we were using that made us make hard decisions.
At one previous job I did have more direct exposure and responsibility around a data tool. It took me four to six weeks
to get our test instance setup. I had to configure TLS communication for the elastic cluster. Generate certificates,
which was a trial by fire process. Then setup the filebeat
> logstash
> elastic
pipeline. At the Logstash level
you had to grok
your data and create multiple indices, so you were shaping the data before ingesting it. This had a
pretty high learning curve and took a lot of time and effort to just proof of concept. Do not get me wrong here, I do
think Elastic is a great tool.
When I first showed up at Snowflake, I honestly did not know too much about the product other than the basic PR stuff
that I had read online, some blog posts, and some Internet posts on various sites. When I wanted to ship webhooks from
our MDM solution, I got access to our Fivetran Instance and with in 30-45 minutes of looking at
some documentation and tinkering in my dev environment I had webhooks shipping. I could not believe it took me
under a hour to figure this out. I was prepared for it to take weeks from previous experiences. One can also just ship
the raw data. No need to grok
, no need to transform my data before ingesting it, and I have all of it.
Enter Data Sharing ❄️
All of our business units ship their data to our data platform. So, every department has their own database, with their own schemas, and their own tables containing all the data they need access to. Since Snowflake’s data platform allows one to have as many databases, warehouses, tables, views, schemas and so forth as they see fit, it allows for easy data sharing. This means we can all share data to each other, and only the data we want to share. All of the data is on the same platform, so you aren’t spinning up a plethora of servers and then configuring them to access each other. The return you get on saved time and labor is already worth it.
In the past, I dealt with gatekeepers of data. I was a data gatekeeper myself. IT and Security typically work with each other at most Organizations. Their goals often align with what the business wants. So, to get data you had to deal with each gatekeeper of each system. IT/Ops and Security typically own several systems if not more on each side. Often you would end up with the data gatekeeper emailing you a spreadsheet of the data you requested. If you were lucky you got API access to consume the data on your own. This is not a good experience, and it was definitely not efficient. With Snowflake, we can freely share data between IT/Ops and Security. When the raw data is updated from ingest, all the data shares among our teams is also updated. There is no more always dealing with a gatekeeper and getting a spreadsheet emailed to you. I had the opportunity to talk about some of this on the Mac Admin Podcost.
Data wants to be shared freely among teams.
Some things we use this data for:
- Fleet Intelligence
- Security Posture Checking
- Proactive Troubleshooting
- Agent health
- Endpoint Data
- Geo Location
- Audits
- Vulnerability scan data
This is my first post on the subject of data sharing and wanted to just post my general thoughts. I will post more down the road.