Home / Chroniques / Cheating or chatting: is ChatGPT a threat to education?
Chatgpt writing assistance
π Digital π Society

Cheating or chatting: is ChatGPT a threat to education?

Julien Grimaud
Assistant Professor of Life Sciences at Sup’Biotech
Pavla Debeljak
Assistant Professor of Bioinformatics at Sup'Biotech
Frank Yates
Director of Research at Sup’Biotech Engineering School
Key takeaways
  • ChatGPT is a chatbot, i.e. a computer program designed to simulate a conversation with a human, which produces convincing and natural texts.
  • Educators are therefore concerned about the risks of using chatbots by students, who may ask ChatGPT to write their essays, for example.
  • Tools exist to identify whether a text has been written by a chatbot or not, but it is currently impossible to be 100% sure.
  • To identify whether a text has been generated by an AI, it is possible to track down strange wording, unnatural syntax, or instances of plagiarism.
  • With the right guidance, chatbots can nevertheless become powerful allies for teaching and studying, but also for the professional world.

Com­mon­ly used in cus­tomer ser­vice and mar­ket­ing, as well as for gam­ing and edu­ca­tion, chat­bots have been around for decades12. The first ever chat­bot, ELIZA, devel­oped in the 1960’s at MIT’s Arti­fi­cial Intel­li­gence Lab­o­ra­to­ry, was designed to sim­u­late a psy­chother­a­pist, using nat­ur­al lan­guage pro­cess­ing to respond to user input. Six­ty years on,chatbots are now becom­ing increas­ing­ly sophis­ti­cat­ed, using AI to under­stand user input thus pro­vid­ing more nat­ur­al and intel­li­gent con­ver­sa­tions. As tech­nol­o­gy con­tin­ues to progress, chat­bots are like­ly to become even more advanced, allow­ing for even more nat­ur­al and per­son­alised con­ver­sa­tions being used in a vari­ety of indus­tries, from health­care to finance3.

Chat­G­PT, released to the pub­lic on Novem­ber 30th 2022, is a chat­bot – a com­put­er pro­gram designed to sim­u­late con­ver­sa­tion with a human, devel­oped by the San Fran­cis­co-based com­pa­ny Ope­nAI. As its name sug­gests, it relies on GPT (Gen­er­a­tive Pre-trained Trans­former), which is a type of arti­fi­cial intel­li­gence (AI) mod­el trained on a large amount of text data, used to gen­er­ate new text in response to users’ prompts. Chat­G­PT has become pop­u­lar because of its abil­i­ty to gen­er­ate con­vinc­ing and engag­ing text upon nat­ur­al lan­guage queries, which has made it a use­ful and user-friend­ly tool for tasks like con­tent cre­ation, auto­mat­ed cus­tomer sup­port, and nat­ur­al lan­guage pro­cess­ing4. As such, edu­ca­tors are ques­tion­ing whether the use of chat­bots by stu­dents is a risk. More­over, just a few days ago Ope­nAi released GPT‑4, the suc­ces­sor to Chat­G­PT. It remains to be seen how much more advanced this new ver­sion is than the pre­vi­ous one.

Could students use chatbots in a malicious way? 

While cheat­ing is an age-old prob­lem in edu­ca­tion5, AI-based chat­bots rep­re­sent a new route for those will­ing to cheat by ask­ing ques­tions about assign­ments or tests. For exam­ple, instead of using the read­ing mate­r­i­al pro­vid­ed by the pro­fes­sor, a stu­dent might use a chat­bot to ask for help with a math prob­lem or get the answer to a mul­ti­ple-choice ques­tion. Note that this is sim­i­lar to typ­ing a ques­tion in a search engine like Google or Bing (which may soon embark Chat­G­PT6). Whether this rather mun­dane action is con­sid­ered cheat­ing is up to the teacher.

While cheat­ing is an age-old prob­lem in edu­ca­tion, AI-based chat­bots rep­re­sent a new route for those will­ing to cheat.

Fur­ther­more, some chat­bots are even spe­cialised in solv­ing cer­tain types of prob­lems. DeepL Trans­late, for instance, is an online AI-based lan­guage trans­la­tion ser­vice, which allows users to trans­late text, web­sites, and doc­u­ments into dif­fer­ent lan­guages with high accu­ra­cy and speed. Oth­er chat­bots spe­cialise in writ­ing com­put­er code, includ­ing Code­bots and Autocode. While these chat­bots are ini­tial­ly designed to assist well inten­tioned users in solv­ing tedious or repet­i­tive tasks, they have the poten­tial to be divert­ed from their ini­tial pur­pose by stu­dents will­ing to cheat. 

Besides answer­ing short ques­tions, pre-trained AI can be used to gen­er­ate essays with a sem­blance of eru­di­tion. Para­phras­ing tools such as Quill­bot, Paper­pal, or Wor­dAI, have already been avail­able for sev­er­al years and can con­vinc­ing­ly change a poor­ly writ­ten man­u­script into a decent aca­d­e­m­ic paper, or indeed change an orig­i­nal text to escape pla­gia­rism detec­tion. How­ev­er, more con­cern­ing is the abil­i­ty of some chat­bots to gen­er­ate lengthy, human-look­ing essays in sec­onds, in response to a short prompt. 

In Chat­G­PT, stu­dents can very sim­ply adjust var­i­ous para­me­ters, such as the length of the bot’s response, the lev­el of ran­dom­ness added to the essay, or the AI mod­el vari­ant used by the chat­bot. The essay thus gen­er­at­ed can then be used as is, or as a start­ing point that the stu­dent can then fur­ther edit. With this approach, stu­dents can eas­i­ly gen­er­ate a sol­id essay in a mat­ter of min­utes. By pro­vid­ing a chat­bot with the same prompt mul­ti­ple times, the soft­ware will gen­er­ate mul­ti­ple ver­sions of the same essay (see Fig­ure 1). This allows stu­dents to select the ver­sion which best suits their needs, or even copy and paste sec­tions from dif­fer­ent ver­sions to cre­ate a unique essay. It is cur­rent­ly impos­si­ble to ver­i­fy with 100% accu­ra­cy that the essay was entire­ly writ­ten by a chat­bot when this method is applied. 

Ask­ing Chat­G­PT about the the­o­ry of Evo­lu­tion. We asked Chat­G­PT to write a para­graph about the the­o­ry of Evo­lu­tion mul­ti­ple times. On the first three queries, our ques­tion was the same – Chat­G­PT answered slight­ly dif­fer­ent­ly each time. For the fourth, we also asked the bot to for­mu­late its answer in a way that would be suit­able for an expert in the field – which shows the extent of lan­guage pro­fi­cien­cy attain­able by the software. 

What are the concerns?

Chat­bots make it easy for stu­dents to pla­gia­rise with­out even real­is­ing it as they might take the answer gen­er­at­ed by a chat­bot and sub­mit it as their own work with­out cit­ing the bot’s sources. This type of pla­gia­rism is espe­cial­ly dif­fi­cult to detect because many chat­bots add ran­dom­ness to their mod­els. Also, while the chat­bot may cre­ate nov­el sen­tences or para­graphs, it can still pro­vide users with ideas and phras­es that are close to their orig­i­nal cor­pus. It is there­fore cru­cial that users take steps to ensure that they are not pla­gia­ris­ing when using a chat­bot. In the future, giv­en some chat­bots spe­cialise in find­ing ref­er­ences7, soon we may see text-writ­ing chat­bots using ref­er­enc­ing chat­bots to source their essay! 

Unlike humans, chat­bots are lim­it­ed in their abil­i­ty to under­stand the con­text of a con­ver­sa­tion, so they can pro­vide incor­rect answers to ques­tions or give mis­lead­ing infor­ma­tion. Also, chat­bots may show a wide range of bias­es. For exam­ple, a chat­bot might use lan­guage in a way that rein­forces stereo­types or gen­der roles, or it may pro­vide incor­rect infor­ma­tion about stig­ma­tised top­ics or those which are con­tro­ver­sial8910. Microsoft­’s Tay chat­bot, released in 2016 by Microsoft, was an arti­fi­cial intel­li­gence project cre­at­ed to inter­act with peo­ple on Twit­ter. It was designed to learn from con­ver­sa­tions with real peo­ple and become smarter over time. A few weeks after its release, Tay was tak­en offline after it began mak­ing con­tro­ver­sial and offen­sive state­ments11

Image gen­er­at­ed with DALL‑E (Ope­nAI) using the prompt “An oil paint­ing of a class­room of stu­dent robots with a pro­fes­sor in the style of Hen­ri Rov­el” © OpenAI. 

A par­tic­u­lar­ly dis­tress­ing con­cern lies in the pos­si­bil­i­ty that the use of chat­bots could lead to a lack of crit­i­cal think­ing skills. As chat­bots become more advanced, they may be able to pro­vide stu­dents with the answers to their ques­tions with­out requir­ing them to think for them­selves. This could lead to stu­dents becom­ing pas­sive learn­ers, which would be a detri­ment to their edu­ca­tion­al devel­op­ment but could also lead to a decrease in creativity. 

Should educators be concerned?

Chat­bots may seem new and excit­ing, but the tech­nol­o­gy itself has been around for decades. Chances are that you read AI-gen­er­at­ed text on a reg­u­lar basis with­out know­ing it. News agen­cies such as Asso­ci­at­ed Press or the Wash­ing­ton Post, for instance, use chat­bots to gen­er­ate short news arti­cles. While Asso­ci­at­ed Press turned to a com­mer­cial­ly-avail­able solu­tion, Word­smith, in 201412, the Wash­ing­ton Post has been using its own in-house chat­bot, Heli­ograf, since at least 201713

The qual­i­ty of the answers pro­vid­ed by chat­bots has sub­stan­tial­ly increased in the past few years, and AI- gen­er­at­ed texts, even in aca­d­e­m­ic set­tings, are now dif­fi­cult to dif­fer­en­ti­ate from human writ­ten texts14. Indeed, although frowned upon by the sci­en­tif­ic com­mu­ni­ty, Chat­G­PT has been (albeit provoca­tive­ly) cit­ed as full-fledged authors in some sci­en­tif­ic papers15.

News agen­cies use chat­bots to gen­er­ate short news articles.

Also, while chat­bots can (and will1617) be used to cheat, they are just one more tool on the student’s belt. Even with­out con­sid­er­ing the recent­ly gained pop­u­lar­i­ty of Chat­G­PT, there are sev­er­al ways stu­dents can cheat on their home­work, such as copy­ing answers from class­mates, using online resources to look up and pla­gia­ris­ing answers, or even hir­ing some­one to do the work for them. In oth­er words: where there is a will to cheat, there is a way. 

How can educators act? 

One of the very first steps edu­ca­tors may take against the mali­cious use of chat­bots is to adopt new reg­u­la­tions, whether as a course pol­i­cy or, even bet­ter, at school lev­el18. Updat­ing the stan­dards of con­duct would cer­tain­ly increase stu­dents’ and edu­ca­tors’ aware­ness towards the issue. It may also dis­cour­age many stu­dents from try­ing to cheat, in fear of the con­se­quences. How­ev­er, it would hard­ly solve the prob­lem in its entirety. 

How about chang­ing the way we test stu­dents? One could imag­ine new, cre­ative types of assign­ments that may not be eas­i­ly solved by chat­bots. While tempt­ing, this solu­tion bears two issues. On one hand, AI-based tech­nolo­gies, espe­cial­ly chat­bots, are a flour­ish­ing field. There­fore, a teacher’s efforts to adapt their assign­ments may very well be ruined at the next chatbot’s soft­ware update. On the oth­er hand, forms of ques­tion­ing that would be con­sid­ered “chat­bot friend­ly”, such as writ­ten essays and quizzes, are invalu­able tools for edu­ca­tors to test skills like com­pre­hen­sion, analy­sis, or syn­the­sis19. New, inno­v­a­tive ques­tion­ing strate­gies are always great, but they should not be the only solution. 

Anoth­er solu­tion yet to be explored is sta­tis­ti­cal water­mark­ing20. Sta­tis­ti­cal water­mark­ing is a type of dig­i­tal water­mark­ing tech­nique used to embed a hid­den mes­sage or data with­in a dig­i­tal sig­nal. In the case of chat­bots, the water­mark would be a set of non-ran­dom prob­a­bil­i­ties to pick cer­tain words or phras­es, designed to be unde­tectable to the human eye, yet still recog­nis­able by com­put­ers. Sta­tis­ti­cal water­mark­ing could be used to detect chat­bot-gen­er­at­ed text.

Sta­tis­ti­cal water­mark­ing is a type of dig­i­tal water­mark­ing tech­nique used to embed a hid­den mes­sage or data with­in a dig­i­tal signal.

How­ev­er, this approach has var­i­ous draw­backs that severe­ly lim­it its usage in the class­room. For instance, tech com­pa­nies may be reluc­tant to imple­ment sta­tis­ti­cal water­mark­ing, because of the rep­u­ta­tion­al and legal risks if their chat­bot was asso­ci­at­ed with rep­re­hen­si­ble actions such as ter­ror­ism or cyber bul­ly­ing. In addi­tion, sta­tis­ti­cal water­mark­ing works only if the cheat­ing stu­dent copy-pastes a large por­tion of text. If they edit the chat­bot gen­er­at­ed essay, or if the text is too short to run a sta­tis­ti­cal analy­sis, then water­mark­ing is useless. 

How to detect AI-generated text? 

One way to detect AI-gen­er­at­ed text is to look for unnat­ur­al or awk­ward phras­ing and syn­tax. AI algo­rithms are gen­er­al­ly lim­it­ed in their abil­i­ty to nat­u­ral­ly express ideas, so their gen­er­at­ed text may have sen­tences that are over­ly long or too short. Addi­tion­al­ly, chat­bots may lack nat­ur­al flow of ideas, as well as use words or phras­es in inap­pro­pri­ate con­texts. In oth­er words, their gen­er­at­ed con­tent may lack the depth and nuance of human-gen­er­at­ed text21. This is espe­cial­ly true for long essays. Anoth­er con­cern we raised ear­li­er regard­ing chat­bots was the risk of pla­gia­rism. As such, a sim­ple way to detect AI-gen­er­at­ed text is to look for the pres­ence of such pla­gia­rism22. Pla­gia­rism-detect­ing engines are read­i­ly available. 

In addi­tion, peo­ple can detect AI-gen­er­at­ed text by look­ing for the pres­ence of a “sta­tis­ti­cal sig­na­ture”. On a basic lev­el, chat­bots are all designed to per­form one task: they pre­dict the words or phras­es that are the most like­ly to fol­low a user’s giv­en prompt. There­fore, at each posi­tion with­in the text, the words or phras­es picked by the chat­bot are very like­ly to be there. This is dif­fer­ent from humans, which write answers and essays based on their cog­ni­tive abil­i­ties rather than prob­a­bil­i­ty charts, and hence may cre­ate uncom­mon word asso­ci­a­tions that would still make sense. Put sim­ply, a human’s answer to a giv­en ques­tion should be less pre­dictable, or more cre­ative, than a chatbot’s. 

This dif­fer­ence in their sta­tis­ti­cal sig­na­ture may be used to detect whether a sequence of words is more pre­dictable (a sta­tis­ti­cal sig­na­ture of chat­bots) or cre­ative (hence like­ly human). Some pro­grams already exist, such as Giant Lan­guage mod­el Test Room (GLTR), devel­oped joint­ly by MIT and Har­vard Uni­ver­si­ty using the pre­vi­ous ver­sion of openAI’s lan­guage mod­el, GPT‑2. We test­ed GLTR with short essays either writ­ten by some of our own stu­dents or gen­er­at­ed by Chat­G­PT. We are hap­py to report that our stu­dents’ answers were eas­i­ly dis­tin­guish­able from the chat­bot (see box below)!

Since GLTR, oth­er AI-detect­ing pro­grams have emerged, such as Ope­nAI-Detec­tor, a pro­gram released short­ly after GLTR and based on sim­i­lar prin­ci­ples, or GPTZe­ro, a com­mer­cial ven­ture ini­tial­ly cre­at­ed by a col­lege stu­dent in 2023. Soon, we hope to see the emer­gence of new tools to detect chat­bot-gen­er­at­ed text, more tai­lored to the needs of edu­ca­tors, sim­i­lar to read­i­ly-avail­able pla­gia­rism detec­tion engines. 

To cheat or to chat?

To end on a pos­i­tive note, let’s not for­get that most stu­dents will­ing­ly com­plete their assign­ments with­out cheat­ing. The first pre­ven­tive action should be to moti­vate stu­dents by explain­ing why the knowl­edge and skills taught dur­ing the course are impor­tant, use­ful, and inter­est­ing23. Cal­cu­la­tors did not put math teach­ers out of a job. Google did not cause schools to shut down. Like­wise, we believe that edu­ca­tors will cer­tain­ly adapt to chat­bots which, despite the legit­i­mate con­cerns they raise, may soon prove invalu­able in many ways. With the prop­er frame­work and guid­ance, chat­bots can become pow­er­ful teach­ing and study­ing assis­tants, as well as invalu­able tools for businesses. 

As such, edu­ca­tors should take the ini­tia­tive to famil­iarise their stu­dents with chat­bots, help them under­stand the poten­tial and lim­its of this tech­nol­o­gy, and teach them how to use chat­bots in an effi­cient, yet respon­si­ble and eth­i­cal way. 

Sta­tis­ti­cal sig­na­ture could be used to detect chat­bot-gen­er­at­ed essays.

The exper­i­ment: As part of a neu­ro­science course giv­en at Sup’Biotech in Fall 2022, we gath­ered the writ­ten answers of 51 stu­dents to the fol­low­ing ques­tion: “Briefly define the term « recep­tive field », then explain how you would mea­sure the recep­tive field of a neu­ron in the somatosen­so­ry cor­tex of a cat.” The ques­tion was part of a take-home, open-book, timed quiz to be tak­en on the course web­site. In par­al­lel, we asked Chat­G­PT to answer the same ques­tion 10 times, to obtain 10 dif­fer­ent chat­bot answers. We used GLTR to com­pare the sta­tis­ti­cal sig­na­ture of the stu­dents’ and chatbot’s answers. 

How GLTR works: For each posi­tion in the text, GLTR looks at what a chat­bot (specif­i­cal­ly: GPT‑2, an old­er ver­sion of Chat­G­PT mod­el) would have picked, before com­par­ing it to the actu­al word. For exam­ple, in the fol­low­ing text: “Biol­o­gy is great!”, the word “great” is ranked 126th among all pos­si­ble words that the chat­bot could have cho­sen (the top chat­bot choice being “a”). GLTR then gen­er­ates a his­togram of all rank­ings, which may be used as a sim­ple form of sta­tis­ti­cal sig­na­ture: GPT-2- gen­er­at­ed texts will be dom­i­nat­ed by high rank­ings, while human prompts will con­tain a greater pro­por­tion of low rankings. 

Pan­el A: Two exem­plar answers, one from an actu­al stu­dent, the oth­er by Chat­G­PT. The texts are col­ored based on GLTR rank­ing. The his­tograms on the right show their sta­tis­ti­cal sig­na­ture. Note that the human response con­tains more low rank­ings than the chatbot. 

Pan­el B: We over­layed the his­tograms obtained from all 51 stu­dents’ and 10 chatbot’s answers (in blue and red, respec­tive­ly). Again, we notice a clear dif­fer­ence between the human and Chat­G­PT texts. In oth­er words, based on the visu­al inspec­tion of the sta­tis­ti­cal sig­na­tures, we are quite con­fi­dent that our stu­dents did not use Chat­G­PT to answer the question. 

1Ina. The His­to­ry Of Chat­bots – From ELIZA to Chat­G­PT. In Onlim​.com. Pub­lished 03-15-2022. Retrieved 01–19- 2023. 
2Thor­becke C. Chat­bots: A long and com­pli­cat­ed his­to­ry. In CNN busi­ness. Pub­lished 08-20-2022. Retrieved 01- 19–2023. 
3Marr B. What Does Chat­G­PT Real­ly Mean For Busi­ness­es? In Forbes. Pub­lished 12-28-2022. Retrieved 01–19- 2023. 
4Tim­o­thy M. 11 Things You Can Do With Chat­G­PT. In MakeUse​Of​.com. Pub­lished 12-20-2022. Retrieved 01–19- 2023.
5Bush­way A, Nash WR (1977). School Cheat­ing Behav­ior. Review of Edu­ca­tion­al Research, 47(4), 623–632. 
6Holmes A. Microsoft and Ope­nAI Work­ing on Chat­G­PT-Pow­ered Bing in Chal­lenge to Google. In The Infor­ma­tion. Pub­lished 01-03-2023. Retrieved 01-19-2023. 
7Vincze J (2017). Vir­tu­al Ref­er­ence Librar­i­ans (Chat­bots). Library Hi Tech News 34(4), 5–8.
8Feine J et al. (2020). Gen­der Bias in Chat­bot Design. Con­ver­sa­tions 2019. Lec­ture Notes in Com­put­er Sci­ence, vol 11970. Springer, Cham. 
9Haroun O. Racist Chat­bots & Sex­ist Robo-Recruiters: Decod­ing Algo­rith­mic Bias. In The AI Jour­nal. Pub­lished 10-11-2023. Retrieved 01-19-2023.
10Bid­dle S. The Internet’s New Favorite AI Pro­pos­es Tor­tur­ing Ira­ni­ans and Sur­veilling Mosques. In The Inter­cept. Pub­lished 12-08-2022. Retrieved 01-19-2023. 
11Vin­vent J. Twit­ter taught Microsoft’s AI chat­bot to be a racist ass­hole in less than a day. In The Verge. Pub­lished 03-24-2016. Retrieved 01-19-2023.
12Miller R. AP’s ‘robot jour­nal­ists’ are writ­ing their own sto­ries now. In The Verge. Post­ed 01-29-2015. Retreived 01-19-2023. 
13Moses L. The Wash­ing­ton Post’s robot reporter has pub­lished 850 arti­cles in the past year. In Digi​day​.com. Post­ed 09-14-2017. Retreived 01-19-2023.
14Else H (2023). Abstracts writ­ten by Chat­G­PT fool sci­en­tists. Nature, 613(7944), 423. 
15Stokel-Walk­er C (2023). Chat­G­PT list­ed as author on research papers: many sci­en­tists dis­ap­prove. Nature (retrieved online ahead of print on 01-23-2023). 
16Gor­don B. North Car­oli­na Pro­fes­sors Catch Stu­dents Cheat­ing With Chat­G­PT. In Gov­ern­ment Tech­nol­o­gy. Pub­lished 01-12-2023. Retrieved 01-19-2023.
17Nolan B. Two pro­fes­sors who say they caught stu­dents cheat­ing on essays with Chat­G­PT explain why AI pla­gia­rism can be hard to prove. In Insid­er. Pub­lished 01-14-2023. Retrieved 01-19-2023.
18John­son A. Chat­G­PT In Schools: Here’s Where It’s Banned—And How It Could Poten­tial­ly Help Stu­dents. In Forbes. Pub­lished 01-18-2023. Retrieved 01-19-2023.
19Krath­wohl DR (2002). A revi­sion of Bloom’s tax­on­o­my: An overview. The­o­ry into prac­tice, 41(4), 212–218.
20Aaron­son S. My AI Safe­ty Lec­ture for UT Effec­tive Altru­ism. In Shtetl-Opti­mized, The Blog of Scott Aaron­son. Post­ed 11-29-2022. Retreived 01-19-2023. 
21Bogost I. Chat­G­PT Is Dumb­er Than You Think. In The Atlantic. Pub­lished 12-07-2022. Retrieved 01-19-2023. 
22Mol­lenkamp D. Can Anti-Pla­gia­rism Tools Detect When AI Chat­bots Write Stu­dent Essays? In EdSurge. Pub­lished 12-21-2022. Retrieved 01-19-2023.
23Shrestha G (2020). Impor­tance of Moti­va­tion in Edu­ca­tion. Inter­na­tion­al Jour­nal of Sci­ence and Research, 9(3), 91–93.

Our world explained with science. Every week, in your inbox.

Get the newsletter