Lots of programs need a function like this: get the all the similar file names in a folder, or get all the files than contain the similar content. Such as media player, image browser. Even comes to the arrangement work with the files on a server.
Here’s the things coming from my mind.
- Levenshtein distance
- Eigen value of a string
- Vectorize a string
Levenshtein distance is easy to use and really fast on runtime, but it’s a bad idea to directly cache the results. At last we need to compute all the strings every time, and it just take away more cpu time. Vectorize a string may be more difficult, but we can cache the eigen values, then time will be saved, and it won’t take up too much storage.
It’s up to the project which one we’ll choose.