Part II — How Machine Learning Gave Us a Better Approach to Patent Evaluation

Patent Quality and Value: Debunking the "All Patents Are Created Equal" Myth

April 11th, 2019 ‧ 6 min read

Part II — How Machine Learning Gave Us a Better Approach at Patent Evaluation

After discussing the traditional way to evaluate patents and its major shortcomings, let’s dig deeper into the matter by analyzing in detail the machine learning technology that enabled InQuartik’s engineers to introduce a more reliable approach.

Acquisition and Data Cleansing

As illustrated below, the machine learning process behind Patentcloud’s Quality and Value Rankings begins with the acquisition of patent data from multiple sources such as bibliography, specification, and prosecution history.

The machine learning process behind Patentcloud’s Patent Quality and Value Rankings.

After dealing with any missing data or variation re-scaling issues through rigorous data cleansing, InQuartik’s data scientists worked with patent professionals to identify a set of 250 defining features.
These features mainly relate to the experience of the stakeholders (i.e., inventors, applicants, agents, and examiners), backward and forward citations, claim structure, prosecution history (i.e., rejections, amendments, change of attorneys) of patents.
Before being included in the set, however, each candidate variable had to be validated. Let’s take for example the variable “number of independent claims” and validate it against the litigated US patents.
A look at the actual data reveals that patents with a higher number of independent claims are more likely to get involved in patent infringement trials:

Therefore, since the Patent Value Ranking reflects the relative tendency of patents to be practiced or monetized after their issuance, this variable was selected as one of the features.

Variables Computing and Model Building

This step involves the statistical approach of the training material through the parallel computing of the variables.
The training material for the Patent Quality Ranking comprises patents requested for reexamination and Inter Partes Review (IPR), while the one for the Patent Value Ranking includes transacted, litigated, and forward-cited patents. In both cases, positive and negative label data is taken into consideration.
As an example, the positive label data for the Patent Value Ranking model includes 47,000 litigated, licensed, and requested for invalidation patents, as well as another 47,000 patents with a relatively higher frequency of transaction events or forward citations.
A similar number of patents that have never been litigated, licensed, transacted, requested for invalidation, or forward-cited is used as negative label data.
The label data for the Patent Quality Ranking model includes a similarly-sized pool of litigated patents: those requested for invalidation function as negative label data, the rest as positive label data.

Since absolute scores (or their aggregation and/or difference) would be practically impossible to interpret, the next step is to assess the similarity of each patent with the high-quality or high-value models identified above and provide the resulting relative rankings:

Patentcloud’s Patent Quality and Value Rankings.

Validation

Following the initial model building phase, InQuartik’s data scientists continued their collaboration with patent professionals to validate the results and optimize the models.
In particular, to continuously track the significance of the correlation between the models and the events they are trying to predict, the team built two monitoring systems: one for patent infringement cases to validate value, the other for USPTO PTAB cases to validate quality.
The monitoring system for the Patent Value Ranking tracks patents that are involved in litigation: since, according to our definition, these patents are deemed to have value because of their potential for being monetized, their Patent Value Ranking should be higher than those that have never been litigated.
As shown below, among the 4,867 litigated patents considered, more than 60% score higher than A:

Below are the detailed figures:

Note: the data is collected from US district court, ITC, and USPTO PTAB IPR cases (inferred as having an infringement dispute behind the IPR petition) ranging from 2017/07/11 to 2018/07/09.

Similarly, reexamination and IPR cases are tracked to verify the reliability of the Patent Quality Ranking. The results are comparable to the previous ones: among the 2,127 patents involved in IPR or reexamination cases considered, more than 60% score lower than C:

Total no. of patents involved in IPR or reexamination cases: 2,127.

The detailed figures are found in the table below:

Note: the data is collected from the USPTO Official Gazette and ranges from 2015/01/06 to 2018/07/10.

The Patent Value Ranking also validates against data related to patent commercialization, such as patent linkage data (FDA Orange Book), Standard Essential Patent (SEP) declarations (ETSI IPR database), and patent virtual marking data collected from several S&P 500 companies.
Large M&A deals, such as the Nortel deals, are selected as well as validation data. All the validations are performed on a portfolio (landscape) or entity basis.
The results conservatively reflect that:
For a patent portfolio or patents of an entity, the percentage of rankings above A and the percentage of rankings below C is significantly relevant to the monetization, commercialization, and invalidation events that the rankings are trying to predict.
For further details, please contact our Client Success experts.

Limitations

Patentcloud’s Quality and Value Rankings are an attempt at predicting the likelihood of future events involving patents. The rankings have both strengths and limitations.
Firstly, they should be leveraged exclusively within the correct context as their definitions may not always align with the various “literal meanings” of the terms “Patent Quality” and “Patent Value” in different scenarios.
For example, even though the Patent Value Ranking relates to the likelihood of patents being practiced or transacted, it does not take into consideration the market size of the products practicing a patent or the cost-effective enhancement of the products practicing a patent.
Additionally, a higher Patent Value Ranking doesn’t necessarily mean that a specific patent will be litigated or transacted: patents are rarely litigated or transacted at all.
The rankings, however, provide greater confidence when identifying patents that have been subject to litigation or transaction within massive portfolios.
As shown below, over 30% of the AA/AAA-ranked patents have been transacted after their issuance:

Patent value model validation using US transaction patent data.

The details are found in the table below:

Note: in order to filter out inter-affiliate-company transaction data, only patents being transacted more than two times are included in the data set.

However, even though there is a significant difference (about six times) between the best and the worst quality patents, around 2/3 of the AA/AAA-ranked patents may never be involved in transactions or litigation.
It is clear that the higher the relevance between the definition of the rankings and the scenarios in which they are applied, the higher their effectiveness.
For contexts requiring different assumptions of “Patent Quality” and “Patent Value”, the rankings may still be applicable, but other relevant indicators should be considered and combined for better results.
Finally, it is mandatory to highlight that each ranking is determined based on all of the data available at the time of publication (or issuance) of the patent, meaning that all post-publication (or post-issuance) data is not taken into consideration for the generation of the ranking itself. Such information, however, may be included in training sets for newly published and issued patents.

To check out the other articles in the series, follow the links below:

Notes:

For example, among the 3,523,853 US patents issued and not abandoned from January 1st, 2000 to August 1st, 2018, only 64,516 (1.8%) have been involved in litigation and only 587,418 (16.7%) have been transacted.

Cookie	Duration	Description
__hssrc	session	This cookie is set by Hubspot. According to their documentation, whenever HubSpot changes the session cookie, this cookie is also set to determine if the visitor has restarted their browser. If this cookie does not exist when HubSpot manages cookies, it is considered a new session.
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
JSESSIONID	session	Used by sites written in JSP. General purpose platform session cookies that are used to maintain users' state across page requests.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__hssc	30 minutes	This cookie is set by HubSpot. The purpose of the cookie is to keep track of sessions. This is used to determine if HubSpot should increment the session number and timestamps in the __hstc cookie. It contains the domain, viewCount (increments each pageView in a session), and session start timestamp.
__hstc	1 year 24 days	This cookie is set by Hubspot and is used for tracking visitors. It contains the domain, utk, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga_GTLMQEG9VF	2 years	This cookie is installed by Google Analytics.
_gat_UA-44688053-5	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_gat_UA-44688053-8	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_gcl_au	3 months	This cookie is used by Google Analytics to understand user interaction with the website.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
fs_uid	1 year	This cookie is set by the provider Fullstory. This cookie is used for session tracking.
hubspotutk	1 year 24 days	This cookie is used by HubSpot to keep track of the visitors to the website. This cookie is passed to Hubspot on form submission and used when deduplicating contacts.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
fr	3 months	The cookie is set by Facebook to show relevant advertisments to the users and measure and improve the advertisements. The cookie also tracks the behavior of the user across the web on sites that have Facebook pixel or Facebook social plugin.

Patent Quality and Value: Debunking the "All Patents Are Created Equal" Myth

April 11th, 2019 ‧ 6 min read

Part II — How Machine Learning Gave Us a Better Approach at Patent Evaluation

Acquisition and Data Cleansing

Variables Computing and Model Building

Validation

Limitations

To check out the other articles in the series, follow the links below:

Related Posts

The IP world moves fast

Get started with Patentcloud today

QI annotations

DD annotations

Products

Solutions

Scenarios

API Service

Join Team Patentcloud

CATEGORIES

INITIATIVES

LATEST NEWS

COMPANY

Making Patent Work Easy

Patent Quality and Value: Debunking the "All Patents Are Created Equal" Myth

April 11th, 2019 ‧ 6 min read

Part II — How Machine Learning Gave Us a Better Approach at Patent Evaluation

Acquisition and Data Cleansing

Variables Computing and Model Building

Validation

Limitations

To check out the other articles in the series, follow the links below:

Share This Information.

Related Posts

The IP world moves fast

Get started with Patentcloud today