{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Summary Statistics History\n", "\n", "This notebook showcases the use of the `inventor_summary_trend_plot()` function to easily plot the history of key summary statistics (matching rate, homonymy rate, and name variation rate) for PatentsView.org.\n", "\n", "## Step 1: Download Required Files\n", "\n", "The first step is to download required files, namely \"g_persistent_inventor.tsv\" and \"g_inventor_not_disambiguated.tsv\" from PatentsView's bulk data downloads. The first file contains the disambiguation history. The second file is used to obtain inventor names." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import wget\n", "import zipfile\n", "import os\n", "from urllib.parse import urlparse\n", "\n", "import plotly.graph_objects as go\n", "import plotly.io as pio\n", "pio.templates.default = \"plotly_white\" # Set plotly theme\n", "\n", "def download_unzip(url, overwrite=False):\n", " basename = os.path.basename(urlparse(url).path)\n", " filename = basename.rstrip(\".zip\")\n", " if not os.path.isfile(filename) or overwrite:\n", " wget.download(url)\n", " with zipfile.ZipFile(basename, 'r') as zip_ref:\n", " zip_ref.extractall(\".\")\n", " os.remove(basename)\n", " return filename\n", "\n", "persistent_inventor_file = download_unzip(\"https://s3.amazonaws.com/data.patentsview.org/download/g_persistent_inventor.tsv.zip\")\n", "inventor_not_disambiguated_file = download_unzip(\"https://s3.amazonaws.com/data.patentsview.org/download/g_inventor_not_disambiguated.tsv.zip\")" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "persistent_inventor = pd.read_csv(persistent_inventor_file, sep=\"\\t\", dtype=str)\n", "inventor_not_disambiguated = pd.read_csv(inventor_not_disambiguated_file, sep=\"\\t\", dtype=str)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2: Compute Inventor Mention Names\n", "\n", "We now recover names associated with each inventor mention." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "inventor_not_disambiguated[\"mention_id\"] = \"US\" + inventor_not_disambiguated.patent_id + \"-\" + inventor_not_disambiguated.inventor_sequence\n", "inventor_not_disambiguated[\"name\"] = inventor_not_disambiguated.raw_inventor_name_first + \" \" + inventor_not_disambiguated.raw_inventor_name_last\n", "inventor_not_disambiguated.set_index(\"mention_id\", inplace=True)\n", "names = inventor_not_disambiguated[\"name\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3: Plot Summary Statistics History\n", "\n", "The `persistent_inventor` and `names` DataFrame can now be passed to `inventor_summary_trend_plot()` to obtain the summary statistics history." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": "2018201920202021202200.20.40.60.81metricMatching rateHomonimy rateName variation ratedatevalue" }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from pv_evaluation.benchmark import inventor_summary_trend_plot\n", "\n", "fig = inventor_summary_trend_plot(persistent_inventor, names)\n", "fig.show(renderer=\"svg\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The plot is tweaked below." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": "2018201920202021202200.20.40.60.81metricMatching rateHomonimy rateName variation rateSummary Statisticsdatevalue" }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig.update_layout(\n", " width=800,\n", " height=300,\n", " title=\"Summary Statistics\"\n", ")\n", "fig['layout'].update(margin=dict(l=20,r=20,b=20,t=60))\n", "\n", "fig.write_image(\"summary_trend.pdf\")\n", "fig.show(renderer=\"svg\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.7.15 ('pv-evaluation': conda)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.15" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "135eb778a123b23717215bebe642ebc480e0ab0e1bc583cf4971f84281f0b229" } } }, "nbformat": 4, "nbformat_minor": 2 }