{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Disambiguation Performance History\n", "\n", "This notebook showcases the use of the `inventor_estimates_trend_plot()` function to easily obtain the history of PatentsView's disambiguation performance since 2017. \n", "\n", "By default, `inventor_estimates_trend_plot()` uses Binette's 2022 inventors benchmark for performance estimates. This benchmark covers 1976 to December 31, 2021. As such, disambiguations need to be restricted to this timeframe and performance estimates are only representative of this time period." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1: Download Required Files From PatentsView's Bulk Data Downloads\n", "\n", "The first step is to download the file \"g_persistent_inventor.tsv\" and \"g_patent.tsv\" from PatentsView's bulk data downloads. The first file contains PatentsView's disambiguation history. The second file contains patent grant dates which are needed to subset the disambiguations to the required timeframe." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import wget\n", "import zipfile\n", "import os\n", "from urllib.parse import urlparse\n", "\n", "import plotly.graph_objects as go\n", "import plotly.io as pio\n", "pio.templates.default = \"plotly_white\" # Set plotly theme\n", "\n", "def download_unzip(url, overwrite=False):\n", " basename = os.path.basename(urlparse(url).path)\n", " filename = basename.rstrip(\".zip\")\n", " if not os.path.isfile(filename) or overwrite:\n", " wget.download(url)\n", " with zipfile.ZipFile(basename, 'r') as zip_ref:\n", " zip_ref.extractall(\".\")\n", " os.remove(basename)\n", " return filename\n", "\n", "persistent_inventor_file = download_unzip(\"https://s3.amazonaws.com/data.patentsview.org/download/g_persistent_inventor.tsv.zip\")\n", "patent_file = download_unzip(\"https://s3.amazonaws.com/data.patentsview.org/download/g_patent.tsv.zip\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2: Subset Disambiguations to the Required Timeframe\n", "\n", "We can now subset disambiguations to the same timeframe as Binette's 2022 inventors benchmark." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "persistent_inventor = pd.read_csv(persistent_inventor_file, sep=\"\\t\", dtype=str)\n", "patent = pd.read_csv(patent_file, sep=\"\\t\", dtype=str)\n", "persistent_inventor = persistent_inventor.merge(patent[[\"patent_id\", \"patent_date\"]], on=\"patent_id\", how=\"left\")\n", "\n", "persistent_inventor = persistent_inventor.query(\"patent_date < '2022-01-01'\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3: Plot Disambiguation History\n", "\n", "`inventor_estimates_trend_plot()` is now ready to be called on the disambiguation history file." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": "201820192020202120220.40.50.60.70.80.91estimatorpairwise precisionpairwise recalldatevalue" }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from pv_evaluation.benchmark import inventor_estimates_trend_plot\n", "\n", "fig = inventor_estimates_trend_plot(persistent_inventor)\n", "fig.show(renderer=\"svg\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The plot's appearance can be tweaked as desired:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": "201820192020202120220.40.60.81estimatorpairwise precisionpairwise recallPairwise precision and Recalldatevalue" }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig.update_layout(\n", " width=800,\n", " height=300,\n", " title=\"Pairwise precision and Recall\"\n", ")\n", "fig['layout'].update(margin=dict(l=20,r=20,b=20,t=60))\n", "\n", "fig.write_image(\"performance_trend.pdf\")\n", "fig.show(renderer=\"svg\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.7.15 ('pv-evaluation': conda)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.15" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "135eb778a123b23717215bebe642ebc480e0ab0e1bc583cf4971f84281f0b229" } } }, "nbformat": 4, "nbformat_minor": 2 }