Privas - Assuring privacy in data exploring systems

The technology of nowadays allows to easily extract, store, process and use information about individuals and organizations. The increase of the amount of data collected and its value to our society was, at first, a great advance that could be used to optimize processes, find solutions and support decisions but also brought new problems related with lack of privacy and malicious attacks to confidential information.
Privas enables, among other things, to anonymize databases. It should be used by data publishers to protect information from attacks controlling the desired privacy level and the data usefulness. In order to specify these requirements a DSL (PrivasL) is used and the automatization of repository transformation, that is based on language processing techniques, is the novelty of this work.
The tool
The Privas tool was built as a two separate pieces of software.
As the main component, Privas framework that enables to preserve the privacy in different types of repositories, by using a description of the repository, and the privacy needs (PrivasL description
).
The web platform offers a visual tool to describe the repositories and its needs and to generate the PrivasL
description.
To better understand each one of these components the Dissertation file that produced this work is advised.
Next, for each component, it is given the steps to compile and run them.
The framework
Compile the framework
mvn package
The privas framework can be compiled using the maven
package compilation.
Run framework
java -jar target/privas.jar FILENAME_PRIVASL
The privas framework should be started using the java
compiler command, using the parameters of the PrivasL
file.
Example
PrivasL description
CSV
Path: "fileCSV.csv"
file
data [ &age, &work_class, &education_num, &marital_status,
&occupation, &race, &sex, &native_country ]
Prevent from: [ recordLinkage ]
To run the example you can start by downloading this census CSV.
The PrivasL description of the CSV can be found on the right. This information can also be downloaded here.
Run privas with census CSV
java -jar target/privas.jar /users/me/Downloads/census.csv
This example, can then be executed with the command on the right.
Privas, with this specification will transform the CSV and anonymize the age information.
The web platform
The Privas' web platform was designed as a separate application.
Its purpose is to generate PrivasL
files through a visual framework.
This platform can also load existing PrivasL
and perform changes to it.
Compilation
Compile the web platform
./gradlew build
The webplatform should be compiled using the maven
package command. This will generate the executable needed to run the platform
Running
Run the web platform
./gradlew bootRun
The platform should be then runned using the java
command presented on the right.
To access the web platform, it should be available in:
Screenshots
Below there are some screenshots of the web platform.
Ontology
During the research, a deep analysis of the current state-of-the-art in data privacy was performed. This analysis allowed to have a better overview of the evolution of the data privacy preservation and the different ways to achieve this preservation. In the end, and as a way to group this knowledge, multiple ontologies were built. These ontologies represent the different concepts and relationships existing in data privacy.
Next a simplified version of a full ontology is present. In the end there also exist links to consult the full ontology and the two other different ontologies produced.
A simplified PPDP ontology

Other ontologies
Other information
Privas was developed during a master thesis.
It was built as a modular software, to allow future extensions. The ways Privas can be extended is by:
- Adding more types of repositories (Currently MySQL and CSV are available)
- Adding more types of privacy models (Currently k-anonymity, l-diversity, t-closened and e-differential privacy are supported)
- Adding different techniques to the current/new available privacy models
- Enhance the decision engine to apply the privacy techniques