{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Summary Statistics History\n",
"\n",
"This notebook showcases the use of the `inventor_summary_trend_plot()` function to easily plot the history of key summary statistics (matching rate, homonymy rate, and name variation rate) for PatentsView.org.\n",
"\n",
"## Step 1: Download Required Files\n",
"\n",
"The first step is to download required files, namely \"g_persistent_inventor.tsv\" and \"g_inventor_not_disambiguated.tsv\" from PatentsView's bulk data downloads. The first file contains the disambiguation history. The second file is used to obtain inventor names."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import wget\n",
"import zipfile\n",
"import os\n",
"from urllib.parse import urlparse\n",
"\n",
"import plotly.graph_objects as go\n",
"import plotly.io as pio\n",
"pio.templates.default = \"plotly_white\" # Set plotly theme\n",
"\n",
"def download_unzip(url, overwrite=False):\n",
" basename = os.path.basename(urlparse(url).path)\n",
" filename = basename.rstrip(\".zip\")\n",
" if not os.path.isfile(filename) or overwrite:\n",
" wget.download(url)\n",
" with zipfile.ZipFile(basename, 'r') as zip_ref:\n",
" zip_ref.extractall(\".\")\n",
" os.remove(basename)\n",
" return filename\n",
"\n",
"persistent_inventor_file = download_unzip(\"https://s3.amazonaws.com/data.patentsview.org/download/g_persistent_inventor.tsv.zip\")\n",
"inventor_not_disambiguated_file = download_unzip(\"https://s3.amazonaws.com/data.patentsview.org/download/g_inventor_not_disambiguated.tsv.zip\")"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"persistent_inventor = pd.read_csv(persistent_inventor_file, sep=\"\\t\", dtype=str)\n",
"inventor_not_disambiguated = pd.read_csv(inventor_not_disambiguated_file, sep=\"\\t\", dtype=str)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Compute Inventor Mention Names\n",
"\n",
"We now recover names associated with each inventor mention."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"inventor_not_disambiguated[\"mention_id\"] = \"US\" + inventor_not_disambiguated.patent_id + \"-\" + inventor_not_disambiguated.inventor_sequence\n",
"inventor_not_disambiguated[\"name\"] = inventor_not_disambiguated.raw_inventor_name_first + \" \" + inventor_not_disambiguated.raw_inventor_name_last\n",
"inventor_not_disambiguated.set_index(\"mention_id\", inplace=True)\n",
"names = inventor_not_disambiguated[\"name\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Plot Summary Statistics History\n",
"\n",
"The `persistent_inventor` and `names` DataFrame can now be passed to `inventor_summary_trend_plot()` to obtain the summary statistics history."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": ""
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from pv_evaluation.benchmark import inventor_summary_trend_plot\n",
"\n",
"fig = inventor_summary_trend_plot(persistent_inventor, names)\n",
"fig.show(renderer=\"svg\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The plot is tweaked below."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": ""
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fig.update_layout(\n",
" width=800,\n",
" height=300,\n",
" title=\"Summary Statistics\"\n",
")\n",
"fig['layout'].update(margin=dict(l=20,r=20,b=20,t=60))\n",
"\n",
"fig.write_image(\"summary_trend.pdf\")\n",
"fig.show(renderer=\"svg\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.7.15 ('pv-evaluation': conda)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.15"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "135eb778a123b23717215bebe642ebc480e0ab0e1bc583cf4971f84281f0b229"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}