Patent ten million

ConfusedIP · 06-14-18 at 12:10 AM

Quote from: MYK on 06-13-18 at 11:37 PM
Quote from: Weng Tianxiang on 06-13-18 at 10:57 PM
Hi rodya,

"Software like EAST... because yes it did."

Can you further explain to me that why EAST would need to change code? I don't know what kind of software EAST is.

In all database storage there are no 7 or 8 digits to store, either 8-bits, 16-bits, 24-bits or 32-bits.
Weng, not all numbers are stored in binary. Some are stored as text strings. Some are stored in other representations such as packed decimal. You really don't know as much about computers or databases as you think you know.

Weeell..... I mean, not to nitpick, but all data, including text, is stored in bits. Eventually.

Anyways, Weng, the issue is simple - before, when you had a 7-number patent string, you definitely knew it was an issued patent number. 8-number strings where definitely application numbers. Now, when you have an 8-number string, it might be a patent number or it might be an older patent application number - you just don't know without further context or maybe formatting. That's all. Going forward, this ambiguity might cause issues with some software products that were hard-coded to the above rules.

Look at us, lawyers arguing about bits and stuff

Weng Tianxiang · 06-14-18 at 03:09 AM

Hi ConfusedIP,

I never think a programmer is so stupid that a database would store any text string in any part of the database.

The patent number is generated by software when the number string is needed, no matter how many digits are needed to generate.

Sorry, this is my last post on this topics, because it is not an appropriate place to argue with somebody on basic programming language.

MYK · 06-14-18 at 10:26 AM

Quote from: ConfusedIP
I mean, not to nitpick, but all data, including text, is stored in bits. Eventually.

Ok, that's cromulent.

Quote from: Weng Tianxiang on 06-14-18 at 03:09 AM
Hi ConfusedIP,

I never think a programmer is so stupid that a database would store any text string in any part of the database.

The patent number is generated by software when the number string is needed, no matter how many digits are needed to generate.

Hi Weng,

Yeah, so how are ya gonna store "D796831" (design), "RE41855" (reissue) or "PP283" (plant) in binary? Oh, I know, let's special-case it whenever the patent "number" has non-numeric characters in it, and we can use two different fields, no wait three fields -- a flag field to indicate whether it's numeric or a text string, plus the numeric field, plus the text string field! Plus all the code to handle both cases! Or you could be boring and just have a second field for just the text prefix (maybe even use a bunch of flag bits that need to be decoded!) and merge them together, just leave it empty if the "patent number" is only digits. Or have two fields and always leave one or the other empty by putting in a constraint. And I'm sure we can come up with a way to index any of these nightmare messes.

I did the math, and (if you ignore all the wasted space in the extra fields) we'll save almost 250MB total! That's, like, one whole CDC 9766 disk drive in 1975 disk drives! Hey, we could save another 16MB if we go with a three-byte field until patent number 16,777,216! (And then go into a Y2K-esque panic when we reach that limit!)

Or you can just use a single text field for the whole thing, which works for all of the different types of patents, and have a single database index for the entire field, and not worry about it, which is much simpler and more maintainable. But what do I know, right?! Only a stupid programmer would store a number as a text string! It's not like I ever did this sort of thing for a living for a decade-plus or went to a university for computer science, oh wait, hmm.

But yeah, let's not argue about it any more because that might show that one of us doesn't really have a clue about what we're blathering about.

Weng Tianxiang · 06-14-18 at 12:14 PM

Hi MYK,

I like your attitude to put question in a clear way.

When a database designers design a record, there are many thing to consider with one goal:

Use the least memory space to store largest information with any potential to expand and least coding efforts

Actually a patent contains many many thing than its patent number.

a. Patent number;
b. Patent country;
c. Patent class;
d. application date; <-- date are stored in digits, not text string.
e. issue data;
...

Even though a patent has many information fields, all text character types (all are limited and predictable) are classified using digits, not keep the characters or strings for each record.

Patent country: 8-bits are enough for all countries in the world.

Patent class: it can be represented by digits too.

Now we talk about your question sentence:
so how are ya gonna store "D796831" (design), "RE41855" (reissue) or "PP283" (plant) in binary?

One can list all information of a patent in a record with enough fields to cover all situations:
1. patent number: 796831, 41855, 283, ...

2. patent classification: "D" = 0, "RE" = 1, "PP" = 2, you can name them as many as you want.

3. Strings "D"; "RE"; "PP" are stored in a concentrated area and their usage is only for print output and one copy of the strings is shared by all records in a database.

4. For each patent classification special handling, program would switch to a different part of code to handle it.

For example, if patent classification = 0, it would print "D" for display if it needs to do so.

"a flag field to indicate whether it's numeric or a text string"

There is no the flag field , it is unnecessary, and all data fields in a record are indicated by their usage types.

For example a database would never store string's lengths 7 or 8!

I don't want to dig deep into the argument.

Anyway, simply put, patent number crossing 10 million will never cause any hiccup.

Robert K S · 06-14-18 at 05:25 PM

Quote from: Weng Tianxiang on 06-14-18 at 12:14 PM
One can list all information of a patent in a record with enough fields to cover all situations:
1. patent number: 796831, 41855, 283, ...

Except you would want this field if any to be unique and useful as the primary key, which you could not do if both design and reissue have a "11111" numbered record.

Incidentally, it's likely that Google chose to represent patent numbers as text strings in its own database since a search for a reissue by its number alone won't turn up the result you want.

MYK · 06-14-18 at 10:36 PM

Quote from: Weng Tianxiang on 06-14-18 at 12:14 PM
Hi MYK,

I like your attitude to put question in a clear way.

When a database designers design a record, there are many thing to consider with one goal:

Use the least memory space to store largest information with any potential to expand and least coding efforts

Actually, Weng, that is almost never the primary goal outside of a classroom exercise. For example, database designs that minimize storage usage are typically highly INefficient compared to databases that contain duplicated data in their various records, because the lack of duplication means that multiple records have to be pulled in order to build the output that a user needs. Each record lookup eats far more resources during runtime than merely duplicating, for example, a patent number in multiple locations. Storage is cheap, record lookups -- especially table joins -- are time-consuming.

This is why "third normal form" database designs are rare in the real world, even though students learn how to create designs using data normalization techniques to eliminate redundancy -- because then they're taught how to analyze when and why to BREAK those rules so that their real-world production databases won't require hardware from the 35th Century in order to retrieve data on human timescales.

Real world databases, especially for large datasets, are never, ever, under any circumstances, going to be designed to minimize storage space. They are going to be designed for query efficiency. If you can avoid a table join in a bunch of common queries by duplicating a field, you do it. If you can avoid having to pull records from multiple tables by putting fields into a record layout that will be left empty in some (even most) records, you do it. If you can reduce the number of indexes by using a sixteen-character text string instead of multiple fields that have to be decoded to be understood, you do it.

You can always buy another disk pack. You can't wait an hour per query while your database system goes around collecting records and matching things up.

Weng Tianxiang · 06-14-18 at 11:39 PM

Hi Robert K S,

"Except you would want this field if any to be unique and useful as the primary key"

To be unique and useful as the primary key? never!

There are many "Robert K S" in US if not thousands, and it doesn't prevent from names as used as primary keys in a database.

After you input your full information, the only one "Robert K S" record would be popped up, it is not because "Robert K S" as a primary key is unique, but because all fields will determine one record.

Actually there are no limits on how to specify keys as primary keys.

"Incidentally, it's likely that Google chose to represent patent numbers as text strings in its own database."

Google uses text search engine algorithm for its service. The algorithm is called suffix array algorithm whose inventors are Manber & Myers (1990) (https://en.wikipedia.org/wiki/Suffix_array) .

Manber:
In 2006, he was hired by Google as one of their vice presidents of engineering.[4] In December 2007, he announced Knol, Google's project to create a knowledge repository.[5]

In October 2010, he was responsible for all the search products at Google.[6]

In October 2014, Manber was named the vice president of engineering at YouTube.[7]

In February 2015, Manber announced that he was leaving YouTube for the National Institutes of Health.[8] He left the role in 2016.

Google's search engine is fully text string based and is specially used for human interface.

USPTO's database is different from Google's.

Hi MYK,

I disagree with your opinions:

1. "database designs that minimize storage usage are typically highly INefficient compared to databases that contain duplicated data in their various records, because the lack of duplication means that multiple records have to be pulled in order to build the output that a user needs. "

The reason why key and basic data are not duplicated in a database, a most important factor, excluding Google's type of text-string search engine, is database maintainability. If one wants to change a data, for data consistency, he must modify all copies of the data, an impossible action to reliably maintain a database.

Google's database is totally different and it has no data unique maintenance problem.

2. "typically highly INefficient":

In a database there are many index files sorted based on one field, so the operation inefficient problem for query efficiency has already been fully resolved.

3. "so that their real-world production databases won't require hardware from the 35th Century"

Without boasting, I am confident that in 3 years I will have a set of hardware patents going into all human database and cloud system to further speed up their performance! And absolutely I will post the successful message on this website.

lazyexaminer · 06-15-18 at 12:03 AM

I may regret entering this pissing contest, and I really have little idea what any of you are talking about, but one possibly important fact is that EAST came out in 1999 (I think). I'm sure it was in development for quite some time before that. It is certainly plausible that what they built is not optimal for how things are today.

I have no idea to what extent the inner workings or the databases or anything have changed or not changed since then, so this may not matter, maybe it was all overhauled at some point. But maybe not, I don't think this kind of thing was ever really a big priority as far as funding goes so I could see them just putting band aid after band aid on over the last 20 years. I know they are planning on rolling out an entirely new search system sometime in the near future, for whatever that's worth.

ConfusedIP · 06-15-18 at 12:08 AM

Without boasting, I am confident that in 3 years this thread will have 1000s of replies and will overtake the "Working in the USPTO" thread on this forum. Weng's issued "hardware patent" No. 10,001098 will have one method claim with 42 elements, requiring performance by 13 distinct entities.

How do I know this is 100% going to happen? I built the time machine (using technology described in patent #1 from another thread) and went to the future to verify this.

MYK · 06-15-18 at 01:55 PM

Quote from: lazyexaminer on 06-15-18 at 12:03 AM
I may regret entering this pissing contest,

I already do. Anyway, I'm going to follow George Carlin's advice, a little late.

Robert K S · 06-19-18 at 06:05 PM

https://patentlyo.com/patent/2018/06/1201-tuesday.html#comments

3... 2... 1...

Robert K S · 06-19-18 at 06:12 PM

Well, post-9:00 AM Eastern, and it's not up in either PAIR or the search database yet, but I do note that a signing ceremony is on the President's schedule for 3:45 PM today.

http://www.whitehousedossier.com/2018/06/18/trump-schedule-tuesday-june-19-2018/

Robert K S · 06-19-18 at 09:06 PM

As of 11:45 AM, patent ten million is up in PAIR. Taking a look at the prosecution history, the first Office action indicated allowable subject-matter in some of the dependent claims.

The file wrapper includes a (granted) petition to expunge trade secret material inadvertently submitted with an IDS. Oops.

There's also this boilerplate "comment on statement of reasons for allowance":

QuoteIn response to the Notice of Allowance dated February 26, 2018, the Applicant provides the following in response to the Statement of Reasons for Allowance. The Applicant believes that the claims and description of the invention are adequately described and set forth in the specification such that the Applicant's claimed invention, and terms and features described therein, are readily understood by those of ordinary skill in the art as set forth in the allowed claims. While the Applicant appreciates the Examiner's reasons for allowance, the Applicant believes that other reasons for allowance exist. The Applicant reserves the right to raise these other reasons if ever necessary.

Not sure what that does, if anything.

Then there's a Rule 312 amendment, and also an examiner's amendment, to do some claim cleanup.

No section 101 consideration of the claims anywhere in the prosecution record.

News: