I could start this article by saying that unstructured data is a massive problem for security professionals. However, I might be making the biggest understatement of a very crazy year by doing so.
Today’s Challenges
The challenges faced today with the explosion of documents and files stored in the most obscure places are well-known to anyone who works in IT. Files show up where you least expect them to be.
In a world focused on the cloud and sharing, we hope that unstructured data is secured where it is protected. Instead, we find salary documents on public shared folders, API key documents in cloud storage buckets, and regulated data such as health records unsecured and available to be accessed by anyone with the know-how to find it.
Keeping this problem in check is a never-ending task. Traditional solutions for data loss prevention (DLP) are focused on what happens when a document is sent out of the organization in an unapproved method. DLP can stop you from sending a file containing Social Security Numbers through email, for example. It does this in the most basic of ways. DLP is programmed with a pattern to look for, and it stops that pattern. For example, it looks for a pattern of XXX-XX-XXXX to block the SSN from being sent. However, what happens if I take out the dashes and send a 9-digit number? Is it blocked? What if that 9-digit number is a Zoom meeting link instead? Would your organization have issues if DLP blocked Zoom invites because of pattern matching?
To stop the unstructured data security problem, you need to ensure you can find the data and understand if it’s critical to the organization. A draft of the CFO’s next great novel on a shared drive is an issue but not a problem. A draft of the next quarter’s bonus schedule shared in the same directory is a big problem. The methods used to identify this data need to be intelligent enough to discern the data’s criticality while providing a way for security professionals to remediate the problem.
Circling Around With Concentric
If you said to yourself, ‘This kinda sounds like a job for AI,’ then you’re not far off the track. I recently had an opportunity to talk to Concentric, a new data access governance company trying to leverage the power of AI to fix the issue above. Concentric Semantic Intelligence is the tool you can use to scan your enterprise to find the unstructured data you need to be worried about. Unlike DLP devices, Concentric is focused on finding the data at rest, not in outbound transmission. Concentric is not bound by traditional rule definitions or regular expression (regex) pattern matching. Instead, it uses AI to look at the context around the data and understand if it might be critical.
One of the biggest things I have heard from the nightmare stories of compliance officers is sharing data on sync-and-share file services. Consider a scenario where one of your knowledge workers is running behind on a project and needs to spend part of their evening working on a potential customer contract. Because of corporate policy, there is no external access to the Sharepoint server where it is located. To make things easier, the employee downloads a copy and saves it in their sync-and-share service folder. They head home, modify what needs to be changed, and wait until tomorrow to upload the new version to Sharepoint in the office.
What happens to that contract once the changes are uploaded to the repository? Is it deleted from the sync-and-share folder? Does it have permissions that don’t allow it to be shared with anyone or viewed by the entire world? What if the user forgets about it? Worse, what if the user leaves the company and then finds it later when they’re working for a competitor?
All these questions are problems for any security team to manage. Concentric gives you the capability to see that the document was uploaded to a sync-and-share service, that it has a questionable title for a document outside a secured environment, and that there is personally identifiable information (PII) in it. Concentric can then alert you that the document is stored improperly, either in a bad location or with improper file permissions. From there, it can help your team remediate the situation. You could block the copy or trigger an alert to the user, letting them know they are potentially opening a security issue. Concentric can even give hints to the user on how to fix the problem, such as focusing on permissions.
Concentric can do this quickly because it’s not looking for defined patterns. Instead, it’s looking for context. Who is accessing these documents? What does the file look like? Does it have lots of dollar signs or numbers that look like SSNs or salaries? Does it have strange permissions for a file stored in a public location? These are all triggers that help the AI in Concentric figure out something is amiss. Because it works on all unstructured data, no matter the location, it can help you find these problem areas on-prem or in the cloud.
The latest addition to the Concentric platform is the idea of Risk Distance. Now, Concentric can scan all the files in proximity to gain additional context and understand the risk profile of a particular document or piece of data. Storing a health record with other health records is okay, but copying it to a folder full of recipes and pictures from your phone is a bad idea. Concentric can see this and prevent the issue while still ensuring the business can operate instead of locking down all file sharing until the issue is resolved.
One thing that Concentric doesn’t do is encryption. They look for the files only. They do not secure them aside from checking permissions and locations. I feel this is an important key distinction for them. Instead of finding the data and locking it up with an encryption algorithm, they promote proper data hygiene. Users will continue risky behavior if they aren’t made aware what they are doing is improper. Encrypting the data at rest doesn’t reduce the rate at which it is shared. What if the data is stored somewhere it can’t be encrypted? What if the data is sent in the clear to a system outside your control? How will you make it secret then? The Concentric approach of detection and remediation works much better, in my opinion.
Bringing It All Together
Try as we might, we are never going to corral unstructured data sharing. It’s critical to the way we work, even before the onset of a global pandemic. We need to share data to collaborate, and we need to do it in a way that adjusts to our workflows and not the other way around. Rather than restricting how we do things, we need solutions like Concentric to detect when we’ve done something unsafe and help us get better about our data hygiene. With a little AI help and some proper procedures, we can ensure our unstructured data doesn’t become unsecured data.
For more information about Concentric and their Semantic Intelligence data access governance solution, make sure you check out http://Concentric.ai