CSET Logo
AboutMethodologyRelated AnalysisCATFeedback
CAT logo Country Activity Tracker (CAT): Artificial Intelligence

CAT presents data related to countries' artificial intelligence (AI) ecosystems to give an overview of domestic capabilities as well as insights on competitiveness and collaboration globally. It presents metrics on AI research, patents, and investment-related activities for AI overall and its various subfields. Read more about our approach.

Methodology

Selection of Indicators

The data for this interface is drawn from three CSET datasets:

  • AI research - CSET merged corpus of scholarly literature including Digital Science Dimensions, Clarivate’s Web of Science, Microsoft Academic Graph, China National Knowledge Infrastructure, arXiv, and Papers With Code;1
  • AI patents - CSET unified patents dataset of merged patent data from 1790 Analytics and Digital Science Dimensions; and
  • AI companies and investment - Crunchbase.

These three resources provide crucial insights into the AI landscape. Scientific publications are key indicators of a country's research strength. Patents measure the transformation of research into inventions that can be used to design and commercialize new products and services. Companies—especially privately held companies (or startups, broadly speaking)—are crucial representatives of innovation, so private equity investment flows into these companies provide useful insights into the health and growth of the country's private sector. Taken together, the three represent crucial elements of a country's R&D and commercial sector, forming some of CSET's key sources of research in studying AI.

Selection of Countries

CAT displays data on countries, as well as certain notable territories and regions. Countries and territories are chosen based upon data availability, which varies across the three datasets supporting the three indicators of interest. For instance, as of November 2021, CSET’s merged corpus had AI research data from 217 countries and territories, the CSET unified patents dataset captured AI patents data for 52 countries' patent offices, and the Crunchbase companies and investment dataset covered 130 countries with at least one AI company listed. We have chosen to display data for all countries and territories wherever there is information available. However, this means that some countries and territories will only have data available for certain indicators.

Selection of Metrics
AI Research

CSET's merged corpus of scholarly literature combines (and deduplicates) scholarship from six datasets—arXiv, China National Knowledge Infrastructure (CNKI), Digital Science, Microsoft Academic Graph (MAG), Papers With Code, and Web of Science—from 2010 onward.2 This collection contains more than 270 million documents (as of May 2022) in multiple languages, which represents a significant proportion of the world's scientific literature. In order to identify AI papers within this merged corpus, we inferred a functional definition from papers in the arXiv.org repository, training a model to predict the categories that authors and editors assign to the site's papers. Applying it to our broader corpus, we identified papers as AI-relevant if they are likely to receive an AI-relevant categorization if uploaded to arXiv.org.3 This method tags only English-language publications for AI relevance. We supplemented this approach to include potential Chinese AI papers by also considering papers that match a set of manually identified English and Chinese keyphrases to be AI-relevant. Our approach is not perfect and may capture some papers that are not AI-relevant while excluding some AI-related papers. CSET continues to improve its AI research classifier, and these improvements will reflect in future versions of CAT.

We attribute papers to a country based on author affiliation, or, more specifically, if one or more of the authors had an organizational (universities, companies, government, etc.) affiliation in that country. An organization's affiliation with a country is based on the data provided by our vendors for the merged corpus dataset and some entity resolution undertaken by CSET. While papers are not double-counted within a country, a paper with multiple authors having varied country affiliations will figure in the counts of multiple countries. For instance, a paper with two French authors and one Spanish author will count as one paper for France and one for Spain. Therefore, all research papers linked with one country should not be seen as exclusive to that country.

Similarly, the top authors listed within one country can also be non-exclusive. Authors affiliated with institutions based in different countries will be listed under multiple countries. The top 10 authors within a country are selected through a two-step process. First, we assign ranks to authors and their organizational affiliations based on the number of papers published since 2010 within a country. Second, we select the top 10 author-organizational affiliation pairs based on where they garnered the highest number of citations, thereby dropping repeated names of authors who have been affiliated with multiple organizations since 2010. For example, if an author named Jane Smith appeared in the top 10 authors list based on the number of AI papers written since 2010 for the United States when associated with Duke University, the University of North Carolina, and Georgia State University, we would select and display Jane Smith's organizational affiliation once by the affiliation that has the most number of citations. Authors that have produced fewer than five papers since 2010 were dropped from our assessment.

AI-related papers are classified into fields of study based on Microsoft Academic Graph level-1 field classification.4 MAG provides multiple topic scores for papers indicating their association with various fields. In our calculation for distribution of papers across fields, each paper is counted in only one category based on the highest score field. Additionally, when a paper lacks MAG classification but is closely linked to the “citation networks” of other papers, we impute its MAG fields based on the papers it is closely related to. Papers with missing MAG fields and whose MAG fields could not be imputed are allocated in the “unclassified” category. While displaying the subfields under AI research, we include only the top 20 MAG fields by number of AI-related papers published globally and drop the remaining from the view. Among the 20 fields, we also list "Other AI" as a subfield of all AI-related papers. This captures AI papers that generally belong to the field but do not specialize in one area.

AI Patents

CSET draws its patent data from a worldwide AI-relevant patents database derived from patent categorizations by 1790 Analytics.5 CAT’s patent metrics describe where patents are being filed, not which country has the most patents. In other words, CAT can’t tell you how many AI patents are owned by Americans, but it can tell you how many patents were filed in the U.S. patent office. There may be overlap between these two categories, but it’s not a perfect match: for example, about half of patent applications filed in the United States are from overseas.

We are working to build metrics based on inventor nationality for future versions of CAT. In the meantime, you can use filing-location-based metrics to understand where AI innovators are most interested in protecting their inventions, and in turn, where they may be conducting R&D, manufacturing, marketing, expanding operations, or competing with foreign companies.

There are two distinct documents that may be associated with a patent: patent applications (requests pending at a patent office for the grant of a patent) and patent grants (approved requests awarding a property right for that invention).6 If a set of documents related to the same invention contain a grant document, then that patent is classified as a granted patent; if it does not, it is classified as a patent application.

CSET identifies AI patents using a combination of keywords and patent classifications, notably the International Patent Classification and the Cooperative Patent Classification.7 All AI patents can be classified into various fields, which are grouped together into three broad categories:8

  • Techniques—how does an invention work (e.g., machine learning, logic models, etc.);
  • Applications—what does the invention do (e.g., speech processing, computer vision, etc.); and
  • Industries—where can the invention be used (e.g., life sciences, transportation, etc.).9

The CAT tracker displays data across all three categories, and taken together, all the fields falling under these three categories represent AI patents at the subcomponent level.

It is important to note that a patent may be included in more than one category or field. For example, a patent describing a self-driving car using a machine learning–based speech recognition tool to securely communicate information to a controller could have entries in multiple categories or fields (technique: machine learning; application: speech processing; industry: transportation and telecommunications).10 Therefore, a patent listed in any category or field should not be seen as being exclusive to that category or field.

CSET's patent dataset also includes other details about a patent, such as the patents' assignees, and priority and publication countries. Patent assignee is an entity that has the property right to the patent. These can be individual inventors or companies.11 An assignee's country of affiliation is based on where the first patent document related to that invention is filed.12 If two documents related to the same invention were filed on the same date in different countries' patent offices, we counted it for each country one time or defaulted to entities that aren't supranational (i.e., not the European Union or WIPO).

We can disaggregate patents by priority country (where the first document for a patent is filed) and publication country (where a patent related to the same invention is subsequently applied for or receives a grant). This cross-filing activity can provide crucial insights on trends in AI patents filed by inventors across multiple countries, such as which countries' markets are inventors most interested in bringing their inventions to.

Since patents get linked to a country based on the jurisdiction where the first document is filed, the total counts of patent applications linked to a country based on first filing (i.e., listed under patent summary metrics) are sometimes different from the counts of patent applications that are subsequently applied for within the country (i.e., listed for the country as a patent application publication country). This is because the first patent document related to an invention is not always an application.

AI Companies & Investment

The assessments made in this section are based on Crunchbase data in CSET's investment database containing information on companies, venture capital funding rounds, and other financial data. Crunchbase data is very different from the research and patents dataset at CSET because there is less visibility into how its data is collected, where it comes from, and how it is classified. It is likely to have gaps, but, nevertheless, it is estimated to be a relatively comprehensive and accurate source for getting estimates of companies and investment transactions in those companies.13 We track AI investment by measuring equity investment into privately held AI companies—that is, companies not publicly traded on a stock exchange.14

There is no single objective definition of an “AI company.” Crunchbase tags a certain set of companies as AI companies based on submissions from registered Crunchbase users. Previous CSET research by Arnold et al. has attempted to identify AI companies by searching across company descriptions for certain AI-related keywords and keyword combinations.15 Recently, CSET launched its Private-sector AI-Related Activity Tracker (PARAT), which includes companies with various degrees of AI activity (AI publications, conference publications, and patents) allowing users to identify companies relevant to their work.16

CAT classifies a company as AI if it falls under any one of the three categories:

  • All companies identified as AI by CSET's keyword search over business descriptions in Crunchbase;
  • All companies tagged as “artificial intelligence” companies in the Crunchbase business classification; and
  • All companies included in PARAT with at least one “AI publication” or at least one “top AI Conference Publication” or at least one “AI Patent.”

This method is designed to capture a broad range of companies with AI-related activities across the globe, even for smaller countries or territories. There was significant overlap across the AI companies identified by each of these three mechanisms, which further affirmed our confidence in our results. Our approach may include some companies and transactions other analysts leave out and exclude companies and transactions that others describe as AI-related.17 We assume that each company includes its headquarters nationality according to the Crunchbase dataset.

When looking at investors in AI-related equity transactions, we assume each organizational investor has the nationality of the country where it (for corporate investors) or its managing entity (for VC and PE funds) is headquartered. Therefore, an investment firm based in San Francisco will be classified as American even if most of its investors are Chinese. We group investments into years based on the date they were announced according to Crunchbase. When looking at transaction counts, we count private equity transactions and venture capital rounds with multiple investors as a single transaction, and not as multiple investments. Equity investment transaction values are often kept confidential. We display the available value totals as “disclosed value,” and where not available, we impute values based on median amounts from funding rounds of a similar investment stage, target country, and year.18

To better understand the AI industry, we developed a set of 37 different applications or sub-fields that AI companies might primarily focus on. AI companies were allocated in these fields based on existing Crunchbase industry groups and industry tags (or categories).19 Certain Crunchbase industry groups, like “health care” and “transportation,” that had direct parallels with CSET fields, like “Healthcare and Life Sciences” and “Transportation,” were allocated companies directly. For certain CSET fields with no direct parallels to Crunchbase groups, we allocated companies through Crunchbase categories instead.20

Companies get tagged to more than one field and can be seen as operating in more than one line of business. Therefore, we count a non-exclusive number of companies under each field.21 Companies that lack sufficient information to categorize or that cannot be reasonably placed in any other field are listed under “Other.”

For more details on our classification system, please see this GitHub page.


1 Data sourced from Dimensions, an inter-linked research information system provided by Digital Science (https://www.dimensions.ai). All China National Knowledge Infrastructure content is furnished for use in the United States by East View Information Services, Minneapolis, MN, USA. ^

2 For more information on how we generated our merged corpus of scholarly literatures, see section 2.1 in Rahkovsky, Ilya, et al. "AI Research Funding Portfolios and Extreme Growth." Frontiers in Research Metrics and Analytics 6 (2021): 11. ^

3 For more details on how an AI-relevant paper is defined, please see James Dunham, Jennifer Melot, and Dewey Murdick, “Identifying the Development and Application of Artificial Intelligence in Scientific Text,” arXiv preprint arXiv:2002:07143 (2020), https://arxiv.org/abs/2002.07143. ^

4 Previously known as Microsoft Academic Graph (MAG), which was discontinued in 2021. CSET researchers build a MAG-inspired field classification model with similar hierarchies and categories. Please note that the current level 1 classification also lists artificial intelligence as a general subfield within all AI papers. This likely captures AI papers that generally belong to the field but do not specialize in any one area listed by the field model. ^

5 For more details on 1790 Analytics, see "Data resources," 1790 Analytics, https://1790analytics.com/#data. ^

6 We drop some other patent documents such as amendments or other administrative documents from our dataset. ^

7 For more details, see Thomas and Murdick, “Patents and Artificial Intelligence: A Primer.” ^

8 The three categories are taken from a WIPO study, with some fields added to the categories to reflect recent technological developments (e.g., nanotechnology, semiconductors). The patent classifications and keywords used to allocate patents to fields were developed by Patrick Thomas of 1790 Analytics. For more details, see Patrick Thomas and Dewey Murdick, “Patents and Artificial Intelligence: A Primer” (Center for Security and Emerging Technology, September 2020), https://cset.georgetown.edu/wp-content/uploads/CSET-Patents-and-Artificial-Intelligence.pdf; World Intellectual Property Organization, WIPO Technology Trends 2019: Artificial Intelligence (Geneva: WIPO, 2019), https://www.wipo.int/edocs/pubdocs/en/wipo_pub_1055.pdf. ^

9 The WIPO study and Thomas et al's work called “Industries” as “Application Fields” in their assessment. Since “fields” represent the various AI subfields across the three indicators in CAT, we've modified the name to avoid confusions in terminology. ^

10 Thomas and Murdick, “Patents and Artificial Intelligence: A Primer.” ^

11 If a patent initially filed by an individual/company gets acquired or otherwise transferred to another individual/company, our patent assignee data will count that patent under both the entities. ^

12 Note that we did not deduplicate assignees unless they were identical across both 1790 Analytics’ and Digital Science Dimensions’ datasets. ^

13 Zachary Arnold, Ilya Rahkovsky and Tina Huang, “Tracking AI Investment: Initial Findings from the Private Markets” (Center for Security and Emerging Technology, September 2020), https://cset.georgetown.edu/publication/tracking-ai-investment/. ^

14 Our measurement of investment transactions and transaction value is limited to only AI companies listed by Crunchbase. In our analysis, private equity transactions, venture capital rounds, and mergers and acquisitions are included, whereas crowdfunding, debt finance, grants, and other non-equity contributions are not. ^

15 For more details, see Arnold, Rahkovsky, and Huang, “Tracking AI Investment.” ^

16 Rebecca Gelles, Zachary Arnold, Ngor Luong, and Jennifer Melot, "PARAT," Center for Security and Emerging Technology, accessed June 14, 2022, https://parat.cset.tech. ^

17 This approach doesn't draw distinction in terms of business maturity, i.e., a well-established company applying AI methods and an early-stage startup developing machine learning technique will be placed in the same category. ^

18 For more details, see Arnold, Rahkovsky, and Huang, “Tracking AI Investment.” ^

19 “What Industries are included in Crunchbase?,” Crunchbase, accessed June 14, 2022, https://support.crunchbase.com/hc/en-us/articles/360043146954-What-Industries-are-included-in-Crunchbase-. ^

20 For instance, CSET fields such as ‘Semiconductor' did not exist as an industry group in Crunchbase. Therefore, companies were allocated to the field from Crunchbase categories such as semiconductor, FPGA, DSP, GPU, and ASIC. In case of some other CSET fields like ‘Privacy and Security,' companies were allocated both from Crunchbase industry group of privacy and security, and category of biometrics. ^

21 By following this approach of utilizing Crunchbase industry tags and only breaking up Crunchbase categories where needed, we have tried to minimize the issue of double-counting companies multiple times across fields. ^

The Country Activity Tracker may be used solely for non-commercial research purposes and may not be downloaded, copied, or extracted using web scrapers or other automated or semi-automated means. The Country Activity Tracker contains metrics derived from data provided by third parties including Clarivate Analytics, East View Information Services, Crunchbase, and 1790 Analytics, as well as data sourced from Dimensions, an inter-linked research information system provided by Digital Science http://www.dimensions.ai.