List of External Data Sources
Thanks to Marc Eskildsen
Title | Description | Authors | Link |
---|---|---|---|
Firm-Level Data | |||
Corporate Ownership Data | Detailed and adjusted U.S. firm-level ownership with PERMNO/CUSIP | Amir Amel-Zadeh, Fiona Kasperk, Martin C. Schmalz | https://corporateownershipdata.com/research/ |
Cost of Capital | The cost of capital project aims to understand how firms’ perceived cost of capital and corporate discount rates are determined, develop over time, and influence corporate investment | Kilian Huber, Niels Gormsen | https://costofcapital.org/ |
Firm-Level Risk | Firm-level risk and sentiment scores (overall, political, non-political, topic-specific etc.) extracted from quarterly earnings calls. With GVKEY | Hassan, Hollander, van Lent, Tahoun and others | https://sites.google.com/view/firmrisk/ |
Proxy Monitor | Publicly available database that tracks shareholder proposals in real time | https://proxymonitor.org/ | |
Open Source Asset Pricing | Equity factor replication including underlying signals with PERMNOs | Chen and Zimmermann | https://www.openassetpricing.com/ |
Is There a Replication Crisis in Finance | Replication code for clean equity data, including factors | Jensen, Kelly, Pedersen | https://github.com/bkelly-lab/ReplicationCrisis |
Grigory Vilkov’s Option Repository | Replication code for many option characteristics, including Ian Martin’s SVIX, the Martin-Wagner measure, risk-neutral skewness, put option slopes, etc. | Grigory Vilkov | https://www.vilkov.net/codedata.html |
Serhiy Kozak’s Data Library | Equity anomaly portfolios and underlying firm signals | Serhiy Kozak | https://sites.google.com/site/serhiykozak/data |
Hoberg-Phillips Data Library | Text-based Network Industry Classifications (TNIC) using similarity scores of firm’s 10-K product descriptions. Library also includes industry concentration and other measures based on the TNIC | Hoberg-Phillips | https://hobergphillips.tuck.dartmouth.edu/ |
LobbyView | Firm-level lobbying information with amounts spent (with GVKEY, unlike OpenSecrets). Also includes bill information etc. | In Song Kim and others | https://www.lobbyview.org/data-download/datasets/ |
OpenSecrets | US election, lobbying, fundraising data (no identifier with CRSP/Compustat) | https://www.opensecrets.org/ | |
Founding Patents | Founding years for 142,110 U.S.-based assignees of USPTO patents granted between 1975 and 2021. This represents 85% of patents and 73% of assignees. The file also links to startups in PitchBook, a leading commercial data provider on venture capital and private equity. | https://foundingpatents.com/data/ | |
DISCERN: Duke Innovation & SCientific Enterprises Research Network | Database links innovation data to Compustat firms (updated expected end-of-2023) | Arora Ashish, Belenzon Sharon, Sheer Lia | https://zenodo.org/records/4320782https://discern-project.com/ |
Joseph Kalmenovitz’ Regulatory Data | Firm-level measures of regulatory fragmentation, similarity, intensity and exposure (with GVKEY/PERMNO) | https://sites.google.com/view/jkalmenovitz/home | |
Climate Finance | |||
Sentometrics Research | Media Climate Change Concerns Index | Ardia, D., Bluteau, K., Boudt, K., Inghelbrecht, K. | https://sentometrics-research.com/#download |
IMF | Country-level climate change indicators | https://climatedata.imf.org/ | |
Biodiversity Risk | Aggregated and disaggregated measures of biodiversity risk | Giglio, Stroebel, Kuchler, Zeng | https://www.biodiversityrisk.org/ |
Firm-level climate change exposure | Firm-level climate change exposure from earnings conference calls | van Lent et al. | https://osf.io/fd6jq/ |
Economic Data and Indicators | |||
The Global Credit Project | The Global Credit Project provides the world’s most comprehensive historical statistics on credit markets covering 120 advanced and emerging economies going back to 1940. Based on archival and manually collected data, we provide information on both aggregate private credit and disaggregated credit by both industry and by type of household credit. | Karsten Müller and Emil Verner | https://www.globalcreditproject.com/ |
Economic Policy Uncertainty Index | Country- and categorical-level measures of uncertainty | Baker, Bloom, Davis and others | https://www.policyuncertainty.com/index.html |
Risk Aversion Index | Financial Proxies to Risk Aversion and Economic Uncertainty | Bekaert, Engstrom, Xu | https://www.nancyxu.net/risk-aversion-index |
The U.S. Treasury Yield Curve: 1961 to the Present | Treasury yield curve estimates of the Federal Reserve Board at a daily frequency from 1961 to the present | Refet S. Gurkaynak, Brian Sack, and Jonathan H. Wright | https://www.federalreserve.gov/pubs/feds/2006/200628/200628abs.html |
Jay Ritter’s IPO data | IPO and SPAC statistics | Jay Ritter | https://site.warrington.ufl.edu/ritter/ipo-data/ |
Federal Reserve Economic Data (many underlying indicators) | |||
US FED | Example: new measure of U.S. financial conditions (https://www.federalreserve.gov/econres/notes/feds-notes/a-new-index-to-measure-us-financial-conditions-20230630.html) | https://www.federalreserve.gov/data.htm | |
New York FED | Example: oil price decomposition into supply and demand factors (https://www.newyorkfed.org/research/policy/oil_price_dynamics_report#/overview) | https://www.newyorkfed.org/research/data_indicators | |
St. Louis FED | https://fred.stlouisfed.org/ | ||
Philadelphia FED | https://www.philadelphiafed.org/surveys-and-data | ||
Government Data | |||
U.S. Government’s Open Data | https://data.gov/ | ||
OECD | Examples: Foreign Direct Investment (FDI) Restrictiveness Index (https://www.oecd.org/investment/fdiindex.htm) and Product Market Regulation (PMD) indices (https://www.oecd.org/economy/reform/indicators-of-product-market-regulation/) and Environmental Policy Stringency (EPS) indices (https://stats.oecd.org/Index.aspx?DataSetCode=EPS) | https://data.oecd.org/ | |
World Bank | https://datacatalog.worldbank.org/home | ||
WITS (World Integrated Trade Solution, part of World Bank) | Country-level trade indicators, including Herfindahl-Hirschman indices | https://wits.worldbank.org/module/ALL/sub-module/ALL/reporter/ALL/year/ALL/tradeflow/ALL/pagesize/50/page/1 | |
U.S. Bureau of Labor Statistics (BLS) | Example: NAICS-level industry gross output and value added (https://www.bls.gov/productivity/tables/) | https://www.bls.gov/ | |
U.S. Census Bureau | Census data and industry-level information | https://www.census.gov/ | |
Governmental, Regulatory and Political | |||
Manifesto Project | Political party preferences in more than 50 countries | https://manifestoproject.wzb.eu/ | |
Environmental Protection Agency (EPA) | Facility-level Toxic Release Investory (TRI) and Greenhouse Gas Emission (GHG) data No identifier with CRSP/Compustat but some published papers (e.g. JF 2023 “The Pollution Premium”) have made available their GVKEY-bridge for TRI data | https://www.epa.gov/toxics-release-inventory-tri-program/tri-data-and-tools https://www.epa.gov/ghgreporting/data-sets |
|
Corporate Prosection Registry | Detailed information about every federal organizational prosecution since 2001, as well as deferred and non-prosecution agreements with organizations since 1990. TICKER for public companies | University of Virginia School of Law and Duke University School of Law | https://corporate-prosecution-registry.com/ |
Federal Election Commission (FEC) | U.S. campaign finance data | https://www.fec.gov/data/browse-data/?tab=bulk-data | |
Voteview | US political ideology and related data for presidents, senators and house congressional members | https://voteview.com/data | |
QuantGov | Counts regulatory restrictions in the U.S. Code of Federal Regulations (CFR) and then attributes those restrictions to the authoring agencies and departments that promulgated them and the NAICS 2-3-4 industries that are affected by them | https://www.quantgov.org/ | |
Software Repositories | |||
University of Notre Dame | Loughran-McDonald dictionary, cleaned SEC/EDGAR 10-X data (including historical headquarter location etc.) | Loughran-McDonald | https://sraf.nd.edu/ |
PoliSciData | Political Science Data | https://www.poliscidata.com/pages/statisticalDataCollections.php?SUBFIELD= | |
Database on Ideology, Money in Politics, and Elections (DIME) | database is intended to make data on campaign finance and elections (1) more centralized and accessible, (2) easier to work with, and (3) more versatile in terms of the types of questions that can be addressed | Stanford | https://data.stanford.edu/dime |
Search Engines for Replication Code or Data | |||
Harvard Dataverse | https://dataverse.harvard.edu/ | ||
Open Science Framework (OSF) | https://osf.io/ | ||
U.S. Government’s Open Data | https://data.gov/ |