Home / Chroniques / Cheating or chatting: is ChatGPT a threat to education?
Chatgpt writing assistance
π Digital π Society

Cheating or chatting: is ChatGPT a threat to education?

GRIMAUD_Julien
Julien Grimaud
Assistant Professor of Life Sciences at Sup’Biotech
DEBELJAK_Pavla
Pavla Debeljak
Assistant Professor of Bioinformatics at Sup'Biotech
YATES_Frank
Frank Yates
Director of Research at Sup’Biotech Engineering School
Key takeaways
  • ChatGPT is a chatbot, i.e. a computer program designed to simulate a conversation with a human, which produces convincing and natural texts.
  • Educators are therefore concerned about the risks of using chatbots by students, who may ask ChatGPT to write their essays, for example.
  • Tools exist to identify whether a text has been written by a chatbot or not, but it is currently impossible to be 100% sure.
  • To identify whether a text has been generated by an AI, it is possible to track down strange wording, unnatural syntax, or instances of plagiarism.
  • With the right guidance, chatbots can nevertheless become powerful allies for teaching and studying, but also for the professional world.

Com­monly used in cus­tom­er ser­vice and mar­ket­ing, as well as for gam­ing and edu­ca­tion, chat­bots have been around for dec­ades12. The first ever chat­bot, ELIZA, developed in the 1960’s at MIT’s Arti­fi­cial Intel­li­gence Labor­at­ory, was designed to sim­u­late a psy­cho­ther­ap­ist, using nat­ur­al lan­guage pro­cessing to respond to user input. Sixty years on,chatbots are now becom­ing increas­ingly soph­ist­ic­ated, using AI to under­stand user input thus provid­ing more nat­ur­al and intel­li­gent con­ver­sa­tions. As tech­no­logy con­tin­ues to pro­gress, chat­bots are likely to become even more advanced, allow­ing for even more nat­ur­al and per­son­al­ised con­ver­sa­tions being used in a vari­ety of indus­tries, from health­care to fin­ance3.

Chat­G­PT, released to the pub­lic on Novem­ber 30th 2022, is a chat­bot – a com­puter pro­gram designed to sim­u­late con­ver­sa­tion with a human, developed by the San Fran­cisco-based com­pany OpenAI. As its name sug­gests, it relies on GPT (Gen­er­at­ive Pre-trained Trans­former), which is a type of arti­fi­cial intel­li­gence (AI) mod­el trained on a large amount of text data, used to gen­er­ate new text in response to users’ prompts. Chat­G­PT has become pop­u­lar because of its abil­ity to gen­er­ate con­vin­cing and enga­ging text upon nat­ur­al lan­guage quer­ies, which has made it a use­ful and user-friendly tool for tasks like con­tent cre­ation, auto­mated cus­tom­er sup­port, and nat­ur­al lan­guage pro­cessing4. As such, edu­cat­ors are ques­tion­ing wheth­er the use of chat­bots by stu­dents is a risk. Moreover, just a few days ago OpenAi released GPT‑4, the suc­cessor to Chat­G­PT. It remains to be seen how much more advanced this new ver­sion is than the pre­vi­ous one.

Could students use chatbots in a malicious way? 

While cheat­ing is an age-old prob­lem in edu­ca­tion5, AI-based chat­bots rep­res­ent a new route for those will­ing to cheat by ask­ing ques­tions about assign­ments or tests. For example, instead of using the read­ing mater­i­al provided by the pro­fess­or, a stu­dent might use a chat­bot to ask for help with a math prob­lem or get the answer to a mul­tiple-choice ques­tion. Note that this is sim­il­ar to typ­ing a ques­tion in a search engine like Google or Bing (which may soon embark Chat­G­PT6). Wheth­er this rather mundane action is con­sidered cheat­ing is up to the teacher.

While cheat­ing is an age-old prob­lem in edu­ca­tion, AI-based chat­bots rep­res­ent a new route for those will­ing to cheat.

Fur­ther­more, some chat­bots are even spe­cial­ised in solv­ing cer­tain types of prob­lems. DeepL Trans­late, for instance, is an online AI-based lan­guage trans­la­tion ser­vice, which allows users to trans­late text, web­sites, and doc­u­ments into dif­fer­ent lan­guages with high accur­acy and speed. Oth­er chat­bots spe­cial­ise in writ­ing com­puter code, includ­ing Code­bots and Auto­code. While these chat­bots are ini­tially designed to assist well inten­tioned users in solv­ing tedi­ous or repet­it­ive tasks, they have the poten­tial to be diver­ted from their ini­tial pur­pose by stu­dents will­ing to cheat. 

Besides answer­ing short ques­tions, pre-trained AI can be used to gen­er­ate essays with a semb­lance of eru­di­tion. Para­phras­ing tools such as Quill­bot, Paper­pal, or WordAI, have already been avail­able for sev­er­al years and can con­vin­cingly change a poorly writ­ten manu­script into a decent aca­dem­ic paper, or indeed change an ori­gin­al text to escape pla­gi­ar­ism detec­tion. How­ever, more con­cern­ing is the abil­ity of some chat­bots to gen­er­ate lengthy, human-look­ing essays in seconds, in response to a short prompt. 

In Chat­G­PT, stu­dents can very simply adjust vari­ous para­met­ers, such as the length of the bot’s response, the level of ran­dom­ness added to the essay, or the AI mod­el vari­ant used by the chat­bot. The essay thus gen­er­ated can then be used as is, or as a start­ing point that the stu­dent can then fur­ther edit. With this approach, stu­dents can eas­ily gen­er­ate a sol­id essay in a mat­ter of minutes. By provid­ing a chat­bot with the same prompt mul­tiple times, the soft­ware will gen­er­ate mul­tiple ver­sions of the same essay (see Fig­ure 1). This allows stu­dents to select the ver­sion which best suits their needs, or even copy and paste sec­tions from dif­fer­ent ver­sions to cre­ate a unique essay. It is cur­rently impossible to veri­fy with 100% accur­acy that the essay was entirely writ­ten by a chat­bot when this meth­od is applied. 

Ask­ing Chat­G­PT about the the­ory of Evol­u­tion. We asked Chat­G­PT to write a para­graph about the the­ory of Evol­u­tion mul­tiple times. On the first three quer­ies, our ques­tion was the same – Chat­G­PT answered slightly dif­fer­ently each time. For the fourth, we also asked the bot to for­mu­late its answer in a way that would be suit­able for an expert in the field – which shows the extent of lan­guage pro­fi­ciency attain­able by the software. 

What are the concerns?

Chat­bots make it easy for stu­dents to pla­gi­ar­ise without even real­ising it as they might take the answer gen­er­ated by a chat­bot and sub­mit it as their own work without cit­ing the bot’s sources. This type of pla­gi­ar­ism is espe­cially dif­fi­cult to detect because many chat­bots add ran­dom­ness to their mod­els. Also, while the chat­bot may cre­ate nov­el sen­tences or para­graphs, it can still provide users with ideas and phrases that are close to their ori­gin­al cor­pus. It is there­fore cru­cial that users take steps to ensure that they are not pla­gi­ar­ising when using a chat­bot. In the future, giv­en some chat­bots spe­cial­ise in find­ing ref­er­ences7, soon we may see text-writ­ing chat­bots using ref­er­en­cing chat­bots to source their essay! 

Unlike humans, chat­bots are lim­ited in their abil­ity to under­stand the con­text of a con­ver­sa­tion, so they can provide incor­rect answers to ques­tions or give mis­lead­ing inform­a­tion. Also, chat­bots may show a wide range of biases. For example, a chat­bot might use lan­guage in a way that rein­forces ste­reo­types or gender roles, or it may provide incor­rect inform­a­tion about stig­mat­ised top­ics or those which are con­tro­ver­sial8910. Microsoft’s Tay chat­bot, released in 2016 by Microsoft, was an arti­fi­cial intel­li­gence pro­ject cre­ated to inter­act with people on Twit­ter. It was designed to learn from con­ver­sa­tions with real people and become smarter over time. A few weeks after its release, Tay was taken off­line after it began mak­ing con­tro­ver­sial and offens­ive state­ments11

Image gen­er­ated with DALL‑E (OpenAI) using the prompt “An oil paint­ing of a classroom of stu­dent robots with a pro­fess­or in the style of Henri Rovel” © OpenAI. 

A par­tic­u­larly dis­tress­ing con­cern lies in the pos­sib­il­ity that the use of chat­bots could lead to a lack of crit­ic­al think­ing skills. As chat­bots become more advanced, they may be able to provide stu­dents with the answers to their ques­tions without requir­ing them to think for them­selves. This could lead to stu­dents becom­ing pass­ive learners, which would be a det­ri­ment to their edu­ca­tion­al devel­op­ment but could also lead to a decrease in creativity. 

Should educators be concerned?

Chat­bots may seem new and excit­ing, but the tech­no­logy itself has been around for dec­ades. Chances are that you read AI-gen­er­ated text on a reg­u­lar basis without know­ing it. News agen­cies such as Asso­ci­ated Press or the Wash­ing­ton Post, for instance, use chat­bots to gen­er­ate short news art­icles. While Asso­ci­ated Press turned to a com­mer­cially-avail­able solu­tion, Word­smith, in 201412, the Wash­ing­ton Post has been using its own in-house chat­bot, Helio­graf, since at least 201713

The qual­ity of the answers provided by chat­bots has sub­stan­tially increased in the past few years, and AI- gen­er­ated texts, even in aca­dem­ic set­tings, are now dif­fi­cult to dif­fer­en­ti­ate from human writ­ten texts14. Indeed, although frowned upon by the sci­entif­ic com­munity, Chat­G­PT has been (albeit pro­voc­at­ively) cited as full-fledged authors in some sci­entif­ic papers15.

News agen­cies use chat­bots to gen­er­ate short news articles.

Also, while chat­bots can (and will1617) be used to cheat, they are just one more tool on the student’s belt. Even without con­sid­er­ing the recently gained pop­ular­ity of Chat­G­PT, there are sev­er­al ways stu­dents can cheat on their home­work, such as copy­ing answers from class­mates, using online resources to look up and pla­gi­ar­ising answers, or even hir­ing someone to do the work for them. In oth­er words: where there is a will to cheat, there is a way. 

How can educators act? 

One of the very first steps edu­cat­ors may take against the mali­cious use of chat­bots is to adopt new reg­u­la­tions, wheth­er as a course policy or, even bet­ter, at school level18. Updat­ing the stand­ards of con­duct would cer­tainly increase stu­dents’ and edu­cat­ors’ aware­ness towards the issue. It may also dis­cour­age many stu­dents from try­ing to cheat, in fear of the con­sequences. How­ever, it would hardly solve the prob­lem in its entirety. 

How about chan­ging the way we test stu­dents? One could ima­gine new, cre­at­ive types of assign­ments that may not be eas­ily solved by chat­bots. While tempt­ing, this solu­tion bears two issues. On one hand, AI-based tech­no­lo­gies, espe­cially chat­bots, are a flour­ish­ing field. There­fore, a teacher’s efforts to adapt their assign­ments may very well be ruined at the next chatbot’s soft­ware update. On the oth­er hand, forms of ques­tion­ing that would be con­sidered “chat­bot friendly”, such as writ­ten essays and quizzes, are invalu­able tools for edu­cat­ors to test skills like com­pre­hen­sion, ana­lys­is, or syn­thes­is19. New, innov­at­ive ques­tion­ing strategies are always great, but they should not be the only solution. 

Anoth­er solu­tion yet to be explored is stat­ist­ic­al water­mark­ing20. Stat­ist­ic­al water­mark­ing is a type of digit­al water­mark­ing tech­nique used to embed a hid­den mes­sage or data with­in a digit­al sig­nal. In the case of chat­bots, the water­mark would be a set of non-ran­dom prob­ab­il­it­ies to pick cer­tain words or phrases, designed to be undetect­able to the human eye, yet still recog­nis­able by com­puters. Stat­ist­ic­al water­mark­ing could be used to detect chat­bot-gen­er­ated text.

Stat­ist­ic­al water­mark­ing is a type of digit­al water­mark­ing tech­nique used to embed a hid­den mes­sage or data with­in a digit­al signal.

How­ever, this approach has vari­ous draw­backs that severely lim­it its usage in the classroom. For instance, tech com­pan­ies may be reluct­ant to imple­ment stat­ist­ic­al water­mark­ing, because of the repu­ta­tion­al and leg­al risks if their chat­bot was asso­ci­ated with rep­re­hens­ible actions such as ter­ror­ism or cyber bul­ly­ing. In addi­tion, stat­ist­ic­al water­mark­ing works only if the cheat­ing stu­dent copy-pastes a large por­tion of text. If they edit the chat­bot gen­er­ated essay, or if the text is too short to run a stat­ist­ic­al ana­lys­is, then water­mark­ing is useless. 

How to detect AI-generated text? 

One way to detect AI-gen­er­ated text is to look for unnat­ur­al or awk­ward phras­ing and syn­tax. AI algorithms are gen­er­ally lim­ited in their abil­ity to nat­ur­ally express ideas, so their gen­er­ated text may have sen­tences that are overly long or too short. Addi­tion­ally, chat­bots may lack nat­ur­al flow of ideas, as well as use words or phrases in inap­pro­pri­ate con­texts. In oth­er words, their gen­er­ated con­tent may lack the depth and nuance of human-gen­er­ated text21. This is espe­cially true for long essays. Anoth­er con­cern we raised earli­er regard­ing chat­bots was the risk of pla­gi­ar­ism. As such, a simple way to detect AI-gen­er­ated text is to look for the pres­ence of such pla­gi­ar­ism22. Pla­gi­ar­ism-detect­ing engines are read­ily available. 

In addi­tion, people can detect AI-gen­er­ated text by look­ing for the pres­ence of a “stat­ist­ic­al sig­na­ture”. On a basic level, chat­bots are all designed to per­form one task: they pre­dict the words or phrases that are the most likely to fol­low a user’s giv­en prompt. There­fore, at each pos­i­tion with­in the text, the words or phrases picked by the chat­bot are very likely to be there. This is dif­fer­ent from humans, which write answers and essays based on their cog­nit­ive abil­it­ies rather than prob­ab­il­ity charts, and hence may cre­ate uncom­mon word asso­ci­ations that would still make sense. Put simply, a human’s answer to a giv­en ques­tion should be less pre­dict­able, or more cre­at­ive, than a chatbot’s. 

This dif­fer­ence in their stat­ist­ic­al sig­na­ture may be used to detect wheth­er a sequence of words is more pre­dict­able (a stat­ist­ic­al sig­na­ture of chat­bots) or cre­at­ive (hence likely human). Some pro­grams already exist, such as Giant Lan­guage mod­el Test Room (GLTR), developed jointly by MIT and Har­vard Uni­ver­sity using the pre­vi­ous ver­sion of openAI’s lan­guage mod­el, GPT‑2. We tested GLTR with short essays either writ­ten by some of our own stu­dents or gen­er­ated by Chat­G­PT. We are happy to report that our stu­dents’ answers were eas­ily dis­tin­guish­able from the chat­bot (see box below)!

Since GLTR, oth­er AI-detect­ing pro­grams have emerged, such as OpenAI-Detect­or, a pro­gram released shortly after GLTR and based on sim­il­ar prin­ciples, or GPTZero, a com­mer­cial ven­ture ini­tially cre­ated by a col­lege stu­dent in 2023. Soon, we hope to see the emer­gence of new tools to detect chat­bot-gen­er­ated text, more tailored to the needs of edu­cat­ors, sim­il­ar to read­ily-avail­able pla­gi­ar­ism detec­tion engines. 

To cheat or to chat?

To end on a pos­it­ive note, let’s not for­get that most stu­dents will­ingly com­plete their assign­ments without cheat­ing. The first pre­vent­ive action should be to motiv­ate stu­dents by explain­ing why the know­ledge and skills taught dur­ing the course are import­ant, use­ful, and inter­est­ing23. Cal­cu­lat­ors did not put math teach­ers out of a job. Google did not cause schools to shut down. Like­wise, we believe that edu­cat­ors will cer­tainly adapt to chat­bots which, des­pite the legit­im­ate con­cerns they raise, may soon prove invalu­able in many ways. With the prop­er frame­work and guid­ance, chat­bots can become power­ful teach­ing and study­ing assist­ants, as well as invalu­able tools for businesses. 

As such, edu­cat­ors should take the ini­ti­at­ive to famil­i­ar­ise their stu­dents with chat­bots, help them under­stand the poten­tial and lim­its of this tech­no­logy, and teach them how to use chat­bots in an effi­cient, yet respons­ible and eth­ic­al way. 

Stat­ist­ic­al sig­na­ture could be used to detect chat­bot-gen­er­ated essays.

The exper­i­ment: As part of a neur­os­cience course giv­en at Sup’Biotech in Fall 2022, we gathered the writ­ten answers of 51 stu­dents to the fol­low­ing ques­tion: “Briefly define the term « recept­ive field », then explain how you would meas­ure the recept­ive field of a neur­on in the soma­to­sensory cor­tex of a cat.” The ques­tion was part of a take-home, open-book, timed quiz to be taken on the course web­site. In par­al­lel, we asked Chat­G­PT to answer the same ques­tion 10 times, to obtain 10 dif­fer­ent chat­bot answers. We used GLTR to com­pare the stat­ist­ic­al sig­na­ture of the stu­dents’ and chatbot’s answers. 

How GLTR works: For each pos­i­tion in the text, GLTR looks at what a chat­bot (spe­cific­ally: GPT‑2, an older ver­sion of Chat­G­PT mod­el) would have picked, before com­par­ing it to the actu­al word. For example, in the fol­low­ing text: “Bio­logy is great!”, the word “great” is ranked 126th among all pos­sible words that the chat­bot could have chosen (the top chat­bot choice being “a”). GLTR then gen­er­ates a his­to­gram of all rank­ings, which may be used as a simple form of stat­ist­ic­al sig­na­ture: GPT-2- gen­er­ated texts will be dom­in­ated by high rank­ings, while human prompts will con­tain a great­er pro­por­tion of low rankings. 

Pan­el A: Two exem­plar answers, one from an actu­al stu­dent, the oth­er by Chat­G­PT. The texts are colored based on GLTR rank­ing. The his­to­grams on the right show their stat­ist­ic­al sig­na­ture. Note that the human response con­tains more low rank­ings than the chatbot. 

Pan­el B: We over­layed the his­to­grams obtained from all 51 stu­dents’ and 10 chatbot’s answers (in blue and red, respect­ively). Again, we notice a clear dif­fer­ence between the human and Chat­G­PT texts. In oth­er words, based on the visu­al inspec­tion of the stat­ist­ic­al sig­na­tures, we are quite con­fid­ent that our stu­dents did not use Chat­G­PT to answer the question. 

1Ina. The His­tory Of Chat­bots – From ELIZA to Chat­G­PT. In Onlim​.com. Pub­lished 03-15-2022. Retrieved 01–19- 2023. 
2Thorbe­cke C. Chat­bots: A long and com­plic­ated his­tory. In CNN busi­ness. Pub­lished 08-20-2022. Retrieved 01- 19–2023. 
3Marr B. What Does Chat­G­PT Really Mean For Busi­nesses? In For­bes. Pub­lished 12-28-2022. Retrieved 01–19- 2023. 
4Timothy M. 11 Things You Can Do With Chat­G­PT. In MakeUseOf​.com. Pub­lished 12-20-2022. Retrieved 01–19- 2023.
5Bush­way A, Nash WR (1977). School Cheat­ing Beha­vi­or. Review of Edu­ca­tion­al Research, 47(4), 623–632. 
6Holmes A. Microsoft and OpenAI Work­ing on Chat­G­PT-Powered Bing in Chal­lenge to Google. In The Inform­a­tion. Pub­lished 01-03-2023. Retrieved 01-19-2023. 
7Vincze J (2017). Vir­tu­al Ref­er­ence Lib­rar­i­ans (Chat­bots). Lib­rary Hi Tech News 34(4), 5–8.
8Feine J et al. (2020). Gender Bias in Chat­bot Design. Con­ver­sa­tions 2019. Lec­ture Notes in Com­puter Sci­ence, vol 11970. Spring­er, Cham. 
9Har­oun O. Racist Chat­bots & Sex­ist Robo-Recruit­ers: Decod­ing Algorithmic Bias. In The AI Journ­al. Pub­lished 10-11-2023. Retrieved 01-19-2023.
10Biddle S. The Internet’s New Favor­ite AI Pro­poses Tor­tur­ing Ira­ni­ans and Sur­veilling Mosques. In The Inter­cept. Pub­lished 12-08-2022. Retrieved 01-19-2023. 
11Vin­vent J. Twit­ter taught Microsoft’s AI chat­bot to be a racist asshole in less than a day. In The Verge. Pub­lished 03-24-2016. Retrieved 01-19-2023.
12Miller R. AP’s ‘robot journ­al­ists’ are writ­ing their own stor­ies now. In The Verge. Pos­ted 01-29-2015. Retreived 01-19-2023. 
13Moses L. The Wash­ing­ton Post’s robot report­er has pub­lished 850 art­icles in the past year. In Digi​day​.com. Pos­ted 09-14-2017. Retreived 01-19-2023.
14Else H (2023). Abstracts writ­ten by Chat­G­PT fool sci­ent­ists. Nature, 613(7944), 423. 
15Stokel-Walk­er C (2023). Chat­G­PT lis­ted as author on research papers: many sci­ent­ists dis­ap­prove. Nature (retrieved online ahead of print on 01-23-2023). 
16Gor­don B. North Car­o­lina Pro­fess­ors Catch Stu­dents Cheat­ing With Chat­G­PT. In Gov­ern­ment Tech­no­logy. Pub­lished 01-12-2023. Retrieved 01-19-2023.
17Nolan B. Two pro­fess­ors who say they caught stu­dents cheat­ing on essays with Chat­G­PT explain why AI pla­gi­ar­ism can be hard to prove. In Insider. Pub­lished 01-14-2023. Retrieved 01-19-2023.
18John­son A. Chat­G­PT In Schools: Here’s Where It’s Banned—And How It Could Poten­tially Help Stu­dents. In For­bes. Pub­lished 01-18-2023. Retrieved 01-19-2023.
19Krath­wohl DR (2002). A revi­sion of Bloom’s tax­onomy: An over­view. The­ory into prac­tice, 41(4), 212–218.
20Aaron­son S. My AI Safety Lec­ture for UT Effect­ive Altru­ism. In Shtetl-Optim­ized, The Blog of Scott Aaron­son. Pos­ted 11-29-2022. Retreived 01-19-2023. 
21Bogost I. Chat­G­PT Is Dumber Than You Think. In The Atlantic. Pub­lished 12-07-2022. Retrieved 01-19-2023. 
22Mol­len­kamp D. Can Anti-Pla­gi­ar­ism Tools Detect When AI Chat­bots Write Stu­dent Essays? In EdSurge. Pub­lished 12-21-2022. Retrieved 01-19-2023.
23Shrestha G (2020). Import­ance of Motiv­a­tion in Edu­ca­tion. Inter­na­tion­al Journ­al of Sci­ence and Research, 9(3), 91–93.

Support accurate information rooted in the scientific method.

Donate