Software Plagiarism Detection
In my seven years as a legal practitioner, I have had encounters with ‘techpreneurs’ and programmers concerning infringement of software, employees and ex-employees utilizing trade secrets for their own benefits despite having written agreements and executing the contract to develop a software which is not ‘open source’.
An increasing and evolving concern in the technology industry is illegal copying of software. In this article I addressed the area of research in technology law which has traditionally been called the identification of software plagiarism.
Simply ask any programmer who has spent countless hours racking his brain attempting to execute a complicated function or debug a piece of computer code that simply would not run. The human-readable source code of software contains intellectual property. The machine-executable binary code produced from the source code often contains intellectual property.
Intellectual property is intangible, but it is precious nevertheless. IP, like physical property, may be owned, exchanged, or merely held. It is also possible for it to be hacked. Much like you wouldn't want anyone to take the fruits of your physical labour, such as the car you constructed or the blueprints for the water-powered generator engine, you wouldn't want anyone to take the fruits of your mental work. This is a kind of fraud. Similarly, once you pay for a tv, it is yours, and stealing it is fraud, such like taking software source code, object code, or software patents that you bought is theft, but not intellectual property theft.
The aim of this article was to describe how software intellectual property can be measured and analysed so as to give value to intellectual rights or to decide whether software intellectual property was misappropriated or infringed in a manner as accurate and impartial as possible.
I will begin by discussing the numerous "tech plagiarism identification" methods and algorithms that have been created in recent years.
A program source code characterization is realistic for correlation determination, and essentially for deciding when copying has taken place. While the idea and concepts are widely sufficient to be applicable in different fields of computer science, in litigation they are of special importance.[1]
This article is aimed at computer scientists, computer engineers, company administrators and software developers, judges, lawyers, expert witnesses, and professional analysts or forensic experts.
Lawyers can learn how Intellectual Property infringement is detected, how Intellectual Property modifications are measured in applications and how their positive or unfavourable results will be better presented in the courts. This article cannot convey in-depth legal expertise because there are no known case laws concerning Software Infringement with the Nigerian legal system in mind. But the concept of the foundations and precedents, mostly from foreign jurisdictions like the United Kingdom and the USA, will give a bit of persuasive guidance on what Nigerian Courts should do. In order to help select experts and expert witnesses, considering the scope for effectiveness in both litigation and interrogation and cross-examination of expert witnesses for the opposing parties, this article provides lawyers the opportunity to grasp the principles involved in evaluating software INTELLECTUAL PROPERTY.
Given that a programme may be expressed in written form and that copyright protects the form or expression of an idea rather than the idea itself, copyright looks to be an appropriate means of protection in principle. This has the advantage of preserving the program's structure rather than the concepts that underpin it, allowing another programmer to build an independent programme that performs the same purpose without infringing on the first program's copyright. Furthermore, copyright protection occurs naturally at the creation of the work and normally does not necessitate any essential creativity or originality as long as it is the author's distinct individual craftsmanship - although the requirement for some minimal creativity/originality can cause difficulties in the case of some utilitarian or functional works. When a work is protected by copyright, the law grants the owner of the copyright various rights to govern the circulation and use of the protected material, including, of course, the ability to allow or prohibit duplication. This is meant to be counterbalanced by user rights that allow a certain amount of copying for specific reasons and under specific conditions, referred to as "fair dealing" in Nigerian and UK laws and "fair use" in the US.
There will be copyright infringement if the code's owner prohibits such copying. Source code theft is also a concern in companies, owing to employee versatility and the ease with which code can be downloaded onto a nearly undetectable flash drive or transmitted by encrypted messaging over the Internet. Patent, trade secrets and trademark violations are major issues.
Copyright has a lengthy lifespan. The Berne Convention's Article 7 provides for copyright protection for the author's lifetime plus 50 years, which has been applied in many nations.
Because computer programmes can become obsolete in a short period of time, a shorter term could be just as effective as a longer one, but neither is the longer term provided under existing copyright law particularly harmful. Overall, copyright may be an effective measure of protection as long as any difficulties stemming from the practical or utilitarian nature of computer programmes are addressed.
There are basic methods of comparing and measuring software theft. There is software source code differentiation, which is a mathematical method for comparing software source code to find basic similarities and differences.[2] The technique of differentiation is particularly useful for finding code that has been directly copied from one program to another. Source code differentiation is better at determining not only what has been copied but also the percentage of copying that has taken place.
The comparison and measurement methods of software theft are basic. I shall clearly define source code differentiation, a form of statistical comparison of software source code to identify the fundamental differences and similarities. The distinction technique is especially helpful in the search for code copied directly from one computer to another. Although there are other techniques for copying, as later discussed in this article, the separation of the source code would be easier to determine just what is copied and how much is copied.
There are several different methods to measure the quality of software, but differentiation of source code has some special capacity to measure production effort and improvements in software and intellectual property.
The differentiation of source code means the measurement of the similarities of the two sets of source code depending on the number of code lines fully matching the overall number of code lines.
Source code distinction is of specific use for the search and measurement, without alteration, of the sum of code copied directly from one application to another and/or from one version to another.
It should be noted that the Computer language is not just vast, but enormous as they evolve with the latest technology trends. Even giving just a mere holistic view on each program will require a textbook on its own. Computer scientists and programmers can attest to this. But for the sake of emphasis, I will briefly mention few computer languages and their evolved counterparts.
These languages, in order of popularity are:
· Python
· PHP
· Java cousin to Kotlin
· Swift
· C++
· C# (C Sharp)
We should take into account that the source code consists of three basic types of elements for the precise purpose of locating copied code. These elements cannot be categorised in the manner in which code elements are classified. For us, the code consists of statements from which a control system may be derived; observations that help to record the code; and strings that are messages to the users.
· Comments: a remark embedded in source codes in such a way that will be ignored by the compiler or interpreter, typically to help people to understand the code.
· Strings: an ordered sequence of text characters stored consecutively in memory and capable of being processed as a single entity.
· Identifier names: a formal name used in source code to refer to a variable, function, procedure, package, etc. or in an operating system to refer to a process, user, group, etc.
It would more usefully be possible to identify correlations between the components most likely to be peculiar to individual programmers in determining the authorship of one programme or an algorithm of one programme in order to identify the source code similarity. These are the comments, strings and identifier names. The most unique way to identify authorship of a code is with the comment/string correlation and name identifier correlation.[3] This method of identifying the author of a software is known as source code correlation. Now source code differentiation is slightly different from source code correlation in the sense that Correlation is structured to generate high scores even though it is associated only with a small part of the code, or only with certain elements of the code, like identifiers or comments. It's done deliberately because it's like the instrument of an investigator. It is built to lead the consumer to dubious parts of the code irrespective of their small nature of those sections. The c ode may have been copied from one programme into another. Many of the code may also have been altered as it is checked as part of the usual developing procedure, or to hide the obvious mischief of the code been copied. An example here is that identifier names could have been renamed, code rearranged, identical instructions substituted and so on. But then maybe one comment stays the same and it's an odd one. Or a short instruction sequence is the same. Correlation is constructed with a relative meaning to guide the investigator to this similarity, based upon the comment or that sequence.
In certain cases, like legal disputes for instance, the author of the source code is relevant. In this take, we are using Twitter vs Meta's subsidiary company, Threads. Twitter must prove its an ex-employees had access to it's source codes and breached the non-compete clause. This can happen when two parties have jointly developed code for a program and, because of a contractual dispute or a royalty determination, must figure out which code was written by which party. This will occur where a code for a programme was jointly created by two parties and the code written by the party cannot be identified due to a contractual disagreement or a royalty agreement.
Source code correlation can be used to detect trade secret theft of a source code. If literal items from the source code of a computer are copied, this is not only a violation of copyright, but quite sometimes is a case of trade secret theft.
Source codes are not always publicly available except it is an open source code. Programmers may be able to execute such functions using source code, but each programme has a number of special functions that are applied in an unknown manner, so they are unique to each programme.
OPEN SOURCE SOFTWARE
Most software is hard to write, requiring years of training and experience to do correctly. Most of the code benefits its owner economically because of the expertise and effort involved in its production. Furthermore, studying source code helps a rival to discover the shortcomings of a programme and can be used for reducing the usefulness and value of the programme.
Open Source Code is a software source code that programmers around the world have developed and can be accessed without charge by any user or organisation, provided the licencing terms are met. The licencing conditions usually specify that the code contain an author's name warning. Licenses also impose other code limitations that differ according to the specific cense the initial programmer wanted, but no fees are usually necessary.
Source code libraries can also be bought by software providers from third parties. A programmer usually pays a one-time fees or a royalty for use, although any user who makes the necessary payments can use the code under any licence restriction. The programmes would have correlations if two separate programmes use this code from a third-party library.
The owner of the source code, who may be an employer or a freelance/independent developer, typically makes fair attempts to keep the source software confidential. Source code software has been kept confidential for all. The only exception to this is where the Source code is expressly deemed as ‘Open Source’ which means it is available to the public to modify and use. Whether the open source code can be used commercially by other third party users is subject to the source owner’s discretion. Source code link does not determine the rationale of the interaction, like any investigator’s equipment. It decides no fault or culpability. This decision is up to the analyst and ultimately the courts to deal with an invasion or invasion of licenced innovation.
THE WAY FORWARD
On the other hand, people may suggest protecting intellectual property of a software by patent. But that is not as straightforward as it looks. It is true that Patent protection is typically considered the appropriate way for securing intellectual property in functional, innovative products, whereas copyright is the main means of protection of creative works. Patents protect the underlying concepts as well as the form and expression, whereas copyright merely protects the form and expression. Unlike copyright, patents must be applied for and reviewed for compliance with the patent's fundamental characteristics, which include uniqueness, ingenuity, and industrial application.
An initial study showed that computer programs as literary works ought to absorb the protection of intellectual property rights and adopt copyright principles.[4]
The challenges with copyright protection for such utilitarian work in certain jurisdictions were acknowledged, but copyright was generally seen to have more advantages than disadvantages. This original research was extremely influential and the strategy followed in each jurisdiction as well as by worldwide agreement was to protect the computer programs' as literary works and, in general, to incorporate intellectual property rights into the existing copyright rules into computer programs. This study was also highly important.[5]
This seemed a sensitive and reasonable approach in many ways. The copyright system is well established and globally recognized; copyright is automatically derived from the production of a work and offers obvious protection against obvious copying and pirating of any work line by line. But the choice of copyright might at times appear to pose as many issues as it solves, both philosophical and pragmatic aims. What about computer programs and computer technologies that might generate traditional copyright law problems? Simply said, two large sectors need to be taken into account.
Although even this amount of duplication might constitute a technicality in copyright violation unless clear provision is made, the economic exploitation of that programme, in its whole, is not threatened by that copying. However, it is precisely the fact, which makes it a straightforward issue to produce a number of illegal copying, whether for private use, use within a trade organization and on the open market, that the success of computer technology relies greatly on the easiness with which programs may be copied.
It is good computing practise to make a backup copy of each computer programme and keep it safe. While even this amount of copying may constitute a technical violation of copyright unless specific permission is granted, such copying does not, in general, pose a threat to the program's economic exploitation. However, because the success of computer technology is primarily reliant on the simplicity with which programmes may be duplicated, it is straightforward to generate many unlawful copies, whether for personal use, internal usage inside a business organisation, or for open market sale.[6] This is not a problem in terms of the actual application of copyright law, but it does pose substantial difficulties in terms of enforcing it in cases of straight disk-to-disk copying and computer software piracy. Making copies does not require any special equipment, and several copies may be done rapidly and at a low cost. These can then be sold for a fraction of the price of the original. Even extensive software copying within an organisation can have a significant negative impact on the copyright owner's rights from faraway locations.
In conclusion, to protect themselves from software plagiarism, it is highly recommended to file a copy of the work being protected with the Trademark Copyright Office in the Nigerian Copyright Commission as proof. In foreign countries like the USA for example, software trademark is treated differently from other works. Ensure that the software contains trade secrets to give it patentable flavour where applicable.
No comments:
Post a Comment