DataGrid

Algorithms

Implementation of search, sorting & selection algorithms to build an efficient datagrid. Project developed within the scope of the Algorithm Design and Analysis discipline, lectured by Professor Thiago Pinheiro de Araújo (FGV EMAp).

Authors

Felipe Lamarca

Cristiano Larréa

Published

October 18, 2023

User Guide

The DataGrid class is specifically designed to work with datasets that follow the structure below:

Column Data Type Search Type Extra
id integer exact unique
owner_id string exact Exactly 5 alphanumeric characters
creation_date string range Format: YYYY-MM-DD hh:mm:ss
count integer range
name string contains Maximum length of 20 characters
content string contains

Each record in the DataGrid is considered an Event.

To initialize the DataGrid class, simply import the module and instantiate the class. Make sure your script can access the folder where the DataGrid module is located, for example:

import sys
sys.path.append('src/')

from datagrid import DataGrid

Initialize the DataGrid class with:

datagrid = DataGrid()

The DataGrid class has the following methods:

  • read_csv(file, sep = ',', encoding = 'utf-8'): populates the DataGrid from the data in the CSV file whose path is provided as a parameter, considering the specified separator and encoding;

  • show(start=0, end=100, prints = False, returns = True): displays the entries in the DataGrid, limiting the display to the range defined by the parameters. returns=True returns the list of Event objects between start and end, and prints=True shows the content of these objects. It displays the table in its current sorted state.

  • insert_row(row): inserts new events into the DataGrid. It takes a dictionary containing the data of the event to be inserted and creates an Event instance from this data. The dictionary must have the column names as keys and the data to be inserted as values, following the structure described in the table above.

  • delete_row(column, value): removes events from the DataGrid. It takes the name of the column and the value to search for in that column. It removes all events that have the searched value in the specified column. If column = 'positions', it removes elements based on their position (index) in the table. In this case, value can be either a range identified by a tuple (start, end) or a single positive integer.

  • search(column, value): searches for events in the DataGrid. It takes the name of the column and the value to search for in that column. It returns a list of Event objects that contain the searched value in the specified column.

  • sort(column, direction = 'asc'): sorts the DataGrid. It takes the name of the column and the sorting direction. To sort in descending order, simply pass direction = 'desc'.

  • select_count(i, j, how = 'median-of-medians'): returns the list of Event objects between positions i and j in the table, considering the count column sorted in ascending order. This operation does not alter the internal structure of the DataGrid. It is also possible to pass the parameter how = 'quickselect' or how = 'heapsort' to choose which algorithm will be used to perform the operation.

The file demo.ipynb contains an example of how to use the DataGrid class with data randomly generated by the file dataGenerator.py. The comments on the operations performed in the notebook refer to the results obtained using the file fake_data_100.csv, which contains 100 rows.

Random Data Generation

If you want to generate random data to test the DataGrid module, simply run the file dataGenerator.py. Remember to adjust the value(s) in the n list at the end of the file to define how many files you want to generate and how many rows each should contain.

Citation

BibTeX citation:
@online{lamarca2023,
  author = {Lamarca, Felipe and Larréa, Cristiano},
  title = {DataGrid},
  date = {2023-10-18},
  url = {https://github.com/felipelmc/DataGrid},
  langid = {en}
}
For attribution, please cite this work as:
Lamarca, Felipe, and Cristiano Larréa. 2023. “DataGrid.” October 18, 2023. https://github.com/felipelmc/DataGrid.