List of External Data Sources

Thanks to Marc Eskildsen

TitleDescriptionAuthorsLink
Firm-Level Data
Corporate Ownership DataDetailed and adjusted U.S. firm-level ownership with PERMNO/CUSIPAmir Amel-Zadeh, Fiona Kasperk, Martin C. Schmalzhttps://corporateownershipdata.com/research/
Cost of CapitalThe cost of capital project aims to understand how firms’ perceived cost of capital and corporate discount rates are determined, develop over time, and influence corporate investmentKilian Huber, Niels Gormsenhttps://costofcapital.org/
Firm-Level RiskFirm-level risk and sentiment scores (overall, political, non-political, topic-specific etc.) extracted from quarterly earnings calls. With GVKEYHassan, Hollander, van Lent, Tahoun and othershttps://sites.google.com/view/firmrisk/

Proxy MonitorPublicly available database that tracks shareholder proposals in real timehttps://proxymonitor.org/
Open Source Asset PricingEquity factor replication including underlying signals with PERMNOsChen and Zimmermannhttps://www.openassetpricing.com/
Is There a Replication Crisis in FinanceReplication code for clean equity data, including factorsJensen, Kelly, Pedersenhttps://github.com/bkelly-lab/ReplicationCrisis
Grigory Vilkov’s Option RepositoryReplication code for many option characteristics, including Ian Martin’s SVIX, the Martin-Wagner measure, risk-neutral skewness, put option slopes, etc.Grigory Vilkovhttps://www.vilkov.net/codedata.html
Serhiy Kozak’s Data LibraryEquity anomaly portfolios and underlying firm signalsSerhiy Kozakhttps://sites.google.com/site/serhiykozak/data
Hoberg-Phillips Data LibraryText-based Network Industry Classifications (TNIC) using similarity scores of firm’s 10-K product descriptions. Library also includes industry concentration and other measures based on the TNICHoberg-Phillipshttps://hobergphillips.tuck.dartmouth.edu/
LobbyViewFirm-level lobbying information with amounts spent (with GVKEY, unlike OpenSecrets). Also includes bill information etc.In Song Kim and othershttps://www.lobbyview.org/data-download/datasets/
OpenSecretsUS election, lobbying, fundraising data (no identifier with CRSP/Compustat)https://www.opensecrets.org/
Founding PatentsFounding years for 142,110 U.S.-based assignees of USPTO patents granted between 1975 and 2021. This represents 85% of patents and 73% of assignees. The file also links to startups in PitchBook, a leading commercial data provider on venture capital and private equity.https://foundingpatents.com/data/
DISCERN: Duke Innovation & SCientific Enterprises Research NetworkDatabase links innovation data to Compustat firms (updated expected end-of-2023)Arora Ashish, Belenzon Sharon, Sheer Liahttps://zenodo.org/records/4320782https://discern-project.com/
Joseph Kalmenovitz’ Regulatory DataFirm-level measures of regulatory fragmentation, similarity, intensity and exposure (with GVKEY/PERMNO)https://sites.google.com/view/jkalmenovitz/home
Climate Finance
Sentometrics ResearchMedia Climate Change Concerns IndexArdia, D., Bluteau, K., Boudt, K., Inghelbrecht, K.https://sentometrics-research.com/#download
IMFCountry-level climate change indicatorshttps://climatedata.imf.org/
Biodiversity RiskAggregated and disaggregated measures of biodiversity riskGiglio, Stroebel, Kuchler, Zenghttps://www.biodiversityrisk.org/
Firm-level climate change exposureFirm-level climate change exposure from earnings conference callsvan Lent et al.https://osf.io/fd6jq/
Economic Data and Indicators
The Global Credit ProjectThe Global Credit Project provides the world’s most comprehensive historical statistics on credit markets covering 120 advanced and emerging economies going back to 1940. Based on archival and manually collected data, we provide information on both aggregate private credit and disaggregated credit by both industry and by type of household credit.Karsten Müller and Emil Vernerhttps://www.globalcreditproject.com/
Economic Policy Uncertainty IndexCountry- and categorical-level measures of uncertaintyBaker, Bloom, Davis and othershttps://www.policyuncertainty.com/index.html
Risk Aversion IndexFinancial Proxies to Risk Aversion and Economic UncertaintyBekaert, Engstrom, Xuhttps://www.nancyxu.net/risk-aversion-index

The U.S. Treasury Yield Curve: 1961 to the Present
Treasury yield curve estimates of the Federal Reserve Board at a daily frequency from 1961 to the presentRefet S. Gurkaynak, Brian Sack, and Jonathan H. Wrighthttps://www.federalreserve.gov/pubs/feds/2006/200628/200628abs.html
Jay Ritter’s IPO dataIPO and SPAC statisticsJay Ritterhttps://site.warrington.ufl.edu/ritter/ipo-data/
Federal Reserve Economic Data (many underlying indicators)
US FEDExample: new measure of U.S. financial conditions (https://www.federalreserve.gov/econres/notes/feds-notes/a-new-index-to-measure-us-financial-conditions-20230630.html)https://www.federalreserve.gov/data.htm
New York FEDExample: oil price decomposition into supply and demand factors (https://www.newyorkfed.org/research/policy/oil_price_dynamics_report#/overview)https://www.newyorkfed.org/research/data_indicators
St. Louis FEDhttps://fred.stlouisfed.org/
Philadelphia FEDhttps://www.philadelphiafed.org/surveys-and-data
Government Data
U.S. Government’s Open Datahttps://data.gov/
OECDExamples: Foreign Direct Investment (FDI) Restrictiveness Index (https://www.oecd.org/investment/fdiindex.htm) and Product Market Regulation (PMD) indices (https://www.oecd.org/economy/reform/indicators-of-product-market-regulation/) and Environmental Policy Stringency (EPS) indices (https://stats.oecd.org/Index.aspx?DataSetCode=EPS)https://data.oecd.org/
World Bankhttps://datacatalog.worldbank.org/home
WITS (World Integrated Trade Solution, part of World Bank)Country-level trade indicators, including Herfindahl-Hirschman indiceshttps://wits.worldbank.org/module/ALL/sub-module/ALL/reporter/ALL/year/ALL/tradeflow/ALL/pagesize/50/page/1
U.S. Bureau of Labor Statistics (BLS)Example: NAICS-level industry gross output and value added (https://www.bls.gov/productivity/tables/)https://www.bls.gov/
U.S. Census BureauCensus data and industry-level informationhttps://www.census.gov/
Governmental, Regulatory and Political
Manifesto ProjectPolitical party preferences in more than 50 countrieshttps://manifestoproject.wzb.eu/
Environmental Protection Agency (EPA)Facility-level Toxic Release Investory (TRI) and Greenhouse Gas Emission (GHG) data
No identifier with CRSP/Compustat but some published papers (e.g. JF 2023 “The Pollution Premium”) have made available their GVKEY-bridge for TRI data
https://www.epa.gov/toxics-release-inventory-tri-program/tri-data-and-tools
https://www.epa.gov/ghgreporting/data-sets
Corporate Prosection RegistryDetailed information about every federal organizational prosecution since 2001, as well as deferred and non-prosecution agreements with organizations since 1990. TICKER for public companiesUniversity of Virginia School of Law and Duke University School of Lawhttps://corporate-prosecution-registry.com/
Federal Election Commission (FEC)U.S. campaign finance datahttps://www.fec.gov/data/browse-data/?tab=bulk-data
VoteviewUS political ideology and related data for presidents, senators and house congressional membershttps://voteview.com/data
QuantGovCounts regulatory restrictions in the U.S. Code of Federal Regulations (CFR) and then attributes those restrictions to the authoring agencies and departments that promulgated them and the NAICS 2-3-4 industries that are affected by themhttps://www.quantgov.org/
Software Repositories
University of Notre DameLoughran-McDonald dictionary, cleaned SEC/EDGAR 10-X data (including historical headquarter location etc.)Loughran-McDonaldhttps://sraf.nd.edu/
PoliSciDataPolitical Science Datahttps://www.poliscidata.com/pages/statisticalDataCollections.php?SUBFIELD=
Database on Ideology, Money in Politics, and Elections (DIME)database is intended to make data on campaign finance and elections (1) more centralized and accessible, (2) easier to work with, and (3) more versatile in terms of the types of questions that can be addressedStanfordhttps://data.stanford.edu/dime
Search Engines for Replication Code or Data
Harvard Dataversehttps://dataverse.harvard.edu/
Open Science Framework (OSF)https://osf.io/
U.S. Government’s Open Datahttps://data.gov/