DataGrid
Implementation of search, sorting & selection algorithms to build an efficient datagrid. Project developed within the scope of the Algorithm Design and Analysis discipline, lectured by Professor Thiago Pinheiro de Araújo (FGV EMAp).
User Guide
The DataGrid
class is specifically designed to work with datasets that follow the structure below:
Column | Data Type | Search Type | Extra |
---|---|---|---|
id | integer | exact | unique |
owner_id | string | exact | Exactly 5 alphanumeric characters |
creation_date | string | range | Format: YYYY-MM-DD hh:mm:ss |
count | integer | range | |
name | string | contains | Maximum length of 20 characters |
content | string | contains |
Each record in the DataGrid is considered an Event
.
To initialize the DataGrid
class, simply import the module and instantiate the class. Make sure your script can access the folder where the DataGrid
module is located, for example:
import sys
'src/')
sys.path.append(
from datagrid import DataGrid
Initialize the DataGrid
class with:
= DataGrid() datagrid
The DataGrid
class has the following methods:
read_csv(file, sep = ',', encoding = 'utf-8')
: populates the DataGrid from the data in the CSV file whose path is provided as a parameter, considering the specified separator and encoding;show(start=0, end=100, prints = False, returns = True)
: displays the entries in the DataGrid, limiting the display to the range defined by the parameters.returns=True
returns the list ofEvent
objects betweenstart
andend
, andprints=True
shows the content of these objects. It displays the table in its current sorted state.insert_row(row)
: inserts new events into the DataGrid. It takes a dictionary containing the data of the event to be inserted and creates anEvent
instance from this data. The dictionary must have the column names as keys and the data to be inserted as values, following the structure described in the table above.delete_row(column, value)
: removes events from the DataGrid. It takes the name of the column and the value to search for in that column. It removes all events that have the searched value in the specified column. Ifcolumn = 'positions'
, it removes elements based on their position (index) in the table. In this case,value
can be either a range identified by a tuple(start, end)
or a single positive integer.search(column, value)
: searches for events in the DataGrid. It takes the name of the column and the value to search for in that column. It returns a list ofEvent
objects that contain the searched value in the specified column.sort(column, direction = 'asc')
: sorts the DataGrid. It takes the name of the column and the sorting direction. To sort in descending order, simply passdirection = 'desc'
.select_count(i, j, how = 'median-of-medians')
: returns the list ofEvent
objects between positionsi
andj
in the table, considering thecount
column sorted in ascending order. This operation does not alter the internal structure of the DataGrid. It is also possible to pass the parameterhow = 'quickselect'
orhow = 'heapsort'
to choose which algorithm will be used to perform the operation.
The file demo.ipynb
contains an example of how to use the DataGrid
class with data randomly generated by the file dataGenerator.py
. The comments on the operations performed in the notebook refer to the results obtained using the file fake_data_100.csv
, which contains 100 rows.
Random Data Generation
If you want to generate random data to test the DataGrid
module, simply run the file dataGenerator.py
. Remember to adjust the value(s) in the n
list at the end of the file to define how many files you want to generate and how many rows each should contain.
Citation
@online{lamarca2023,
author = {Lamarca, Felipe and Larréa, Cristiano},
title = {DataGrid},
date = {2023-10-18},
url = {https://github.com/felipelmc/DataGrid},
langid = {en}
}