WorldDigitalTechnologyAcademy(WDTA)
LargeLanguageModelSecurity
TestingMethod
WorldDigitalTechnologyAcademyStandard
WDTAAI-STR-02
Edition:2024-04©WDTA2024–Allrightsreserved.
TheWorldDigitalTechnologyStandardWDTAAI-STR-02isdesignatedasaWDTA
norm.ThisdocumentisthepropertyoftheWorldDigitalTechnologyAcademy(WDTA)andis
protectedbyinternationalcopyrightlaws.Anyuseofthisdocument,includingreproduction,
modification,distribution,orrepublication,withoutthepriorwrittenpermissionofWDTA,is
prohibited.WDTAisnotliableforanyerrorsoromissionsinthisdocument.
DiscovermoreWDTAstandardandrelatedpublicationsathttps://wdtacademy.org/.
VersionHistory*
StandardID Version Date Changes
WDTAAI-STR-02 1.0 2024-04 InitialReleaseForeword
The"LargeLanguageModelSecurityTestingMethod,"developedandissuedbytheWorldDigital
TechnologyAcademy(WDTA),representsacrucialadvancementinourongoingcommitmentto
ensuringtheresponsibleandsecureuseofartificialintelligencetechnologies.AsAIsystems,
particularlylargelanguagemodels,continuetobecomeincreasinglyintegraltovariousaspectsof
society,theneedforacomprehensivestandardtoaddresstheirsecuritychallengesbecomes
paramount.Thisstandard,anintegralpartofWDTA'sAISTR(Safety,Trust,Responsibility)program,
isspecificallydesignedtotacklethecomplexitiesinherentinlargelanguagemodelsandprovide
rigorousevaluationmetricsandprocedurestotesttheirresilienceagainstadversarialattacks.
Thisstandarddocumentprovidesaframeworkforevaluatingtheresilienceoflargelanguagemodels
(LLMs)againstadversarialattacks.TheframeworkappliestothetestingandvalidationofLLMs
acrossvariousattackclassifications,includingL1Random,L2Blind-Box,L3Black-Box,andL4
White-Box.KeymetricsusedtoassesstheeffectivenessoftheseattacksincludetheAttackSuccess
Rate(R)andDeclineRate(D).Thedocumentoutlinesadiverserangeofattackmethodologies,such
asinstructionhijackingandpromptmasking,tocomprehensivelytesttheLLMs'resistanceto
differenttypesofadversarialtechniques.Thetestingproceduredetailedinthisstandarddocument
aimstoestablishastructuredapproachforevaluatingtherobustnessofLLMsagainstadversarial
attacks,enablingdevelopersandorganizationstoidentifyandmitigatepotentialvulnerabilities,and
ultimatelyimprovethesecurityandreliabilityofAIsystemsbuiltusingLLMs.
Byestablishingthe"LargeLanguageModelSecurityTestingMethod,"WDTAseekstoleadtheway
increatingadigitalecosystemwhereAIsystemsarenotonlyadvancedbutalsosecureandethically
aligned.Itsymbolizesourdedicationtoafuturewheredigitaltechnologiesaredevelopedwithakeen
senseoftheirsocietalimplicationsandareleveragedforthegreaterbenefitofall.
ExecutiveChairmanofWDTA
WDTA AI-STR-02-LLM security Large Language Model Security
文档预览
中文文档
22 页
50 下载
1000 浏览
0 评论
309 收藏
3.0分
温馨提示:本文档共22页,可预览 3 页,如浏览全部内容或当前文档出现乱码,可开通会员下载原始文档
本文档由 人生无常 于 2024-05-12 13:16:23上传分享