Project Overview
This project is dedicated to analyzing the vast amount of data generated by the EDGAR (Electronic Data Gathering, Analysis, and Retrieval system) web logs, which record web requests to the SEC's EDGAR database by users worldwide. These logs are not only extensive—often amounting to 2 GB uncompressed data per day—but also contain anonymized but valuable user behavior data.
The primary goal is to develop tools in Python for extracting and processing information from these logs to understand better the behavior of users, particularly investment firms. This analysis can provide insights into which companies or industries hedge funds may be considering for investment and whether they rely more on automated or manual research for their trading decisions.
Project Components
- main.ipybv: A Jupyter notebook used for the analysis of user behavior based on the data extracted.
- : A Python module for extracting data from the EDGAR filings.
Motivation
Understanding the behavior of users who access the EDGAR database can offer predictive insights into market trends and investment patterns. This project aims to uncover these patterns by analyzing how different documents are accessed and correlated with investment success.
Usage
Open the main.ipybv notebook in a Jupyter environment to view the analysis:
jupyter notebook main.ipybvData
The data used in this project consists of anonymized logs from the EDGAR database, which include user interactions such as the types of filings accessed. These logs are structured to provide insights into user demographics and behavior without compromising individual privacy.
Acknowledgments
- SEC for providing the EDGAR logs.
- All contributors who have worked on analyzing public data for academic and professional purposes.
License
Distributed under the MIT License. SeeLICENSE.txt for more information.
