List of External Data Sources

Thanks to Marc Eskildsen

Firm-Level Data
Corporate Ownership DataDetailed and adjusted U.S. firm-level ownership with PERMNO/CUSIPAmir Amel-Zadeh, Fiona Kasperk, Martin C. Schmalz
Cost of CapitalThe cost of capital project aims to understand how firms’ perceived cost of capital and corporate discount rates are determined, develop over time, and influence corporate investmentKilian Huber, Niels Gormsen
Firm-Level RiskFirm-level risk and sentiment scores (overall, political, non-political, topic-specific etc.) extracted from quarterly earnings calls. With GVKEYHassan, Hollander, van Lent, Tahoun and others

Proxy MonitorPublicly available database that tracks shareholder proposals in real time
Open Source Asset PricingEquity factor replication including underlying signals with PERMNOsChen and Zimmermann
Is There a Replication Crisis in FinanceReplication code for clean equity data, including factorsJensen, Kelly, Pedersen
Grigory Vilkov’s Option RepositoryReplication code for many option characteristics, including Ian Martin’s SVIX, the Martin-Wagner measure, risk-neutral skewness, put option slopes, etc.Grigory Vilkov
Serhiy Kozak’s Data LibraryEquity anomaly portfolios and underlying firm signalsSerhiy Kozak
Hoberg-Phillips Data LibraryText-based Network Industry Classifications (TNIC) using similarity scores of firm’s 10-K product descriptions. Library also includes industry concentration and other measures based on the TNICHoberg-Phillips
LobbyViewFirm-level lobbying information with amounts spent (with GVKEY, unlike OpenSecrets). Also includes bill information etc.In Song Kim and others
OpenSecretsUS election, lobbying, fundraising data (no identifier with CRSP/Compustat)
Founding PatentsFounding years for 142,110 U.S.-based assignees of USPTO patents granted between 1975 and 2021. This represents 85% of patents and 73% of assignees. The file also links to startups in PitchBook, a leading commercial data provider on venture capital and private equity.
DISCERN: Duke Innovation & SCientific Enterprises Research NetworkDatabase links innovation data to Compustat firms (updated expected end-of-2023)Arora Ashish, Belenzon Sharon, Sheer Lia
Joseph Kalmenovitz’ Regulatory DataFirm-level measures of regulatory fragmentation, similarity, intensity and exposure (with GVKEY/PERMNO)
Climate Finance
Sentometrics ResearchMedia Climate Change Concerns IndexArdia, D., Bluteau, K., Boudt, K., Inghelbrecht, K.
IMFCountry-level climate change indicators
Biodiversity RiskAggregated and disaggregated measures of biodiversity riskGiglio, Stroebel, Kuchler, Zeng
Firm-level climate change exposureFirm-level climate change exposure from earnings conference callsvan Lent et al.
Economic Data and Indicators
The Global Credit ProjectThe Global Credit Project provides the world’s most comprehensive historical statistics on credit markets covering 120 advanced and emerging economies going back to 1940. Based on archival and manually collected data, we provide information on both aggregate private credit and disaggregated credit by both industry and by type of household credit.Karsten Müller and Emil Verner
Economic Policy Uncertainty IndexCountry- and categorical-level measures of uncertaintyBaker, Bloom, Davis and others
Risk Aversion IndexFinancial Proxies to Risk Aversion and Economic UncertaintyBekaert, Engstrom, Xu

The U.S. Treasury Yield Curve: 1961 to the Present
Treasury yield curve estimates of the Federal Reserve Board at a daily frequency from 1961 to the presentRefet S. Gurkaynak, Brian Sack, and Jonathan H. Wright
Jay Ritter’s IPO dataIPO and SPAC statisticsJay Ritter
Federal Reserve Economic Data (many underlying indicators)
US FEDExample: new measure of U.S. financial conditions (
New York FEDExample: oil price decomposition into supply and demand factors (
St. Louis FED
Philadelphia FED
Government Data
U.S. Government’s Open Data
OECDExamples: Foreign Direct Investment (FDI) Restrictiveness Index ( and Product Market Regulation (PMD) indices ( and Environmental Policy Stringency (EPS) indices (
World Bank
WITS (World Integrated Trade Solution, part of World Bank)Country-level trade indicators, including Herfindahl-Hirschman indices
U.S. Bureau of Labor Statistics (BLS)Example: NAICS-level industry gross output and value added (
U.S. Census BureauCensus data and industry-level information
Governmental, Regulatory and Political
Manifesto ProjectPolitical party preferences in more than 50 countries
Environmental Protection Agency (EPA)Facility-level Toxic Release Investory (TRI) and Greenhouse Gas Emission (GHG) data
No identifier with CRSP/Compustat but some published papers (e.g. JF 2023 “The Pollution Premium”) have made available their GVKEY-bridge for TRI data
Corporate Prosection RegistryDetailed information about every federal organizational prosecution since 2001, as well as deferred and non-prosecution agreements with organizations since 1990. TICKER for public companiesUniversity of Virginia School of Law and Duke University School of Law
Federal Election Commission (FEC)U.S. campaign finance data
VoteviewUS political ideology and related data for presidents, senators and house congressional members
QuantGovCounts regulatory restrictions in the U.S. Code of Federal Regulations (CFR) and then attributes those restrictions to the authoring agencies and departments that promulgated them and the NAICS 2-3-4 industries that are affected by them
Software Repositories
University of Notre DameLoughran-McDonald dictionary, cleaned SEC/EDGAR 10-X data (including historical headquarter location etc.)Loughran-McDonald
PoliSciDataPolitical Science Data
Database on Ideology, Money in Politics, and Elections (DIME)database is intended to make data on campaign finance and elections (1) more centralized and accessible, (2) easier to work with, and (3) more versatile in terms of the types of questions that can be addressedStanford
Search Engines for Replication Code or Data
Harvard Dataverse
Open Science Framework (OSF)
