版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1
GPT-4oSystemCard
OpenAI
August8,2024
1Introduction
GPT-4o[1]isanautoregressiveomnimodel,whichacceptsasinputanycombinationoftext,audio,
image,andvideoandgeneratesanycombinationoftext,audio,andimageoutputs.It’strainedend-to-endacrosstext,vision,andaudio,meaningthatallinputsandoutputsareprocessedbythesameneuralnetwork.
GPT-4ocanrespondtoaudioinputsinaslittleas232milliseconds,withanaverageof320
milliseconds,whichissimilartohumanresponsetime[2]inaconversation
.ItmatchesGPT-4TurboperformanceontextinEnglishandcode,withsignificantimprovementontextinnon-Englishlanguages,whilealsobeingmuchfasterand50%cheaperintheAPI.GPT-4oisespeciallybetteratvisionandaudiounderstandingcomparedtoexistingmodels.
InlinewithourcommitmenttobuildingAIsafelyandconsistentwithourvoluntarycommitments
totheWhiteHouse[3],wearesharingtheGPT-4oSystemCard,whichincludesourPreparedness
Framework[4]evaluations
.InthisSystemCard,weprovideadetailedlookatGPT-4o’scapabilities,limitations,andsafetyevaluationsacrossmultiplecategories,withafocusonspeech-to-speech
(voice)1
whilealsoevaluatingtextandimagecapabilities,andthemeasureswe’veimplementedtoensurethemodelissafeandaligned.Wealsoincludethirdpartyassessmentsondangerouscapabilities,aswellasdiscussionofpotentialsocietalimpactsofGPT-4otextandvisioncapabilities.
2Modeldataandtraining
GPT-4o’stextandvoicecapabilitieswerepre-trainedusingdatauptoOctober2023,sourcedfromawidevarietyofmaterialsincluding:
?Selectpubliclyavailabledata,mostlycollectedfromindustry-standardmachinelearningdatasetsandwebcrawls.
?Proprietarydatafromdatapartnerships.Weformpartnershipstoaccessnon-publiclyavailabledata,suchaspay-walledcontent,archives,andmetadata.Forexample,we
partneredwithShutterstock[5]o
nbuildinganddeliveringAI-generatedimages.
1Someevaluations,inparticular,themajorityofthePreparednessEvaluations,thirdpartyassessmentsandsomeofthesocietalimpactsfocusonthetextandvisioncapabilitiesofGPT-4o,dependingontheriskassessed.ThisisindicatedaccordinglythroughouttheSystemCard.
2
ThekeydatasetcomponentsthatcontributetoGPT-4o’scapabilitiesare:
?WebData:Datafrompublicwebpagesprovidesarichanddiverserangeofinformation,ensuringthemodellearnsfromawidevarietyofperspectivesandtopics.
?CodeandMath:–Includingcodeandmathdataintraininghelpsthemodeldeveloprobustreasoningskillsbyexposingittostructuredlogicandproblem-solvingprocesses.
?MultimodalData–Ourdatasetincludesimages,audio,andvideototeachtheLLMshowtointerpretandgeneratenon-textualinputandoutput.Fromthisdata,themodellearnshowtointerpretvisualimages,actionsandsequencesinreal-worldcontexts,language
patterns,andspeechnuances.
Priortodeployment,OpenAIassessesandmitigatespotentialrisksthatmaystemfromgenerativemodels,suchasinformationharms,biasanddiscrimination,orothercontentthatviolatesourusagepolicies.Weuseacombinationofmethods,spanningallstagesofdevelopmentacrosspre-training,post-training,productdevelopment,andpolicy.Forexample,duringpost-training,wealignthemodeltohumanpreferences;wered-teamtheresultingmodelsandaddproduct-levelmitigationssuchasmonitoringandenforcement;andweprovidemoderationtoolsandtransparencyreportstoourusers.
Wefindthatthemajorityofeffectivetestingandmitigationsaredoneafterthepre-trainingstagebecausefilteringpre-traineddataalonecannotaddressnuancedandcontext-specificharms.Atthesametime,certainpre-trainingfilteringmitigationscanprovideanadditionallayerofdefensethat,alongwithothersafetymitigations,helpexcludeunwantedandharmfulinformationfromourdatasets:
?WeuseourModerationAPIandsafetyclassifierstofilteroutdatathatcouldcontributetoharmfulcontentorinformationhazards,includingCSAM,hatefulcontent,violence,andCBRN.
?Aswithourpreviousimagegenerationsystems,wefilterourimagegenerationdatasetsforexplicitcontentsuchasgraphicsexualmaterialandCSAM.
?Weuseadvanceddatafilteringprocessestoreducepersonalinformationfromtrainingdata.
?UponreleasingDALL-E3,wepilotedanewapproachtogiveusersthepowerto
opt
imagesoutoftraining.
Torespectthoseopt-outs,wefingerprintedtheimagesandusedthefingerprintstoremoveallinstancesoftheimagesfromthetrainingdatasetfortheGPT-4o
seriesofmodels.
3Riskidentification,assessmentandmitigation
Deploymentpreparationwascarriedoutviaidentifyingpotentialrisksofspeechtospeechmodels,exploratorydiscoveryofadditionalnovelrisksthroughexpertredteaming,turningtheidentifiedrisksintostructuredmeasurementsandbuildingmitigationsforthem.WealsoevaluatedGPT-4o
inaccordancewithourPreparednessFramework[4]
.
3
3.1Externalredteaming
OpenAIworkedwithmorethan100externalredteamers2,speakingatotalof45different
languages,andrepresentinggeographicbackgroundsof29differentcountries.RedteamershadaccesstovarioussnapshotsofthemodelatdifferentstagesoftrainingandsafetymitigationmaturitystartinginearlyMarchandcontinuingthroughlateJune2024.
Externalredteamingwascarriedoutinfourphases.ThefirstthreephasestestedthemodelviaaninternaltoolandthefinalphaseusedthefulliOSexperiencefortestingthemodel.Atthetimeofwriting,externalredteamingoftheGPT-4oAPIisongoing.
Phase1
?10redteamersworkingonearlymodelcheckpointsstillindevelopment
?Thischeckpointtookinaudioandtextasinputandproducedaudioandtextasoutputs.
?Single-turnconversations
Phase2
?30redteamersworkingonmodelcheckpointswithearlysafetymitigations
?Thischeckpointtookinaudio,image&textasinputsandproducedaudioandtextasoutputs.
?Single&multi-turnconversations
Phase3
?65redteamersworkingonmodelcheckpoints&candidates
?Thischeckpointtookinaudio,image,andtextasinputsandproducedaudio,image,andtextasoutputs.
?Improvedsafetymitigationstestedtoinformfurtherimprovements
?Multi-turnconversations
Phase4
?65redteamersworkingonfinalmodelcandidates&assessingcomparativeperformance
?ModelaccessviaadvancedvoicemodewithiniOSappforrealuserexperience;reviewedandtaggedviainternaltool.
?Thischeckpointtookinaudioandvideoprompts,andproducedaudiogenerations.
?Multi-turnconversationsinrealtime
Redteamerswereaskedtocarryoutexploratorycapabilitydiscovery,assessnovelpotentialrisksposedbythemodel,andstresstestmitigationsastheyaredevelopedandimproved-specificallythoseintroducedbyaudioinputandgeneration(speechtospeechcapabilities).Thisredteaming
effortbuildsuponpriorwork,includingasdescribedintheGPT-4SystemCard[6]andthe
GPT-4(V)SystemCard[7]
.
Redteamerscoveredcategoriesthatspannedviolativeanddisallowedcontent(illegalerotic
content,violence,selfharm,etc),mis/disinformation,bias,ungroundedinferences,sensitive
2Spanningself-reporteddomainsofexpertiseincluding:CognitiveScience,Chemistry,Biology,Physics,Com-puterScience,Steganography,PoliticalScience,Psychology,Persuasion,Economics,Anthropology,Sociology,HCI,FairnessandBias,Alignment,Education,Healthcare,Law,ChildSafety,Cybersecurity,Finance,Mis/disinforma-tion,PoliticalUse,Privacy,Biometrics,LanguagesandLinguistics
4
traitattribution,privateinformation,geolocation,personidentification,emotionalperceptionandanthropomorphismrisks,fraudulentbehaviorandimpersonation,copyright,naturalsciencecapabilities,andmultilingualobservations.
ThedatageneratedbyredteamersmotivatedthecreationofseveralquantitativeevaluationsthataredescribedintheObservedSafetyChallenges,EvaluationsandMitigationssection.Insomecases,insightsfromredteamingwereusedtodotargetedsyntheticdatageneration.Modelswereevaluatedusingbothautogradersand/ormanuallabelinginaccordancewithsomecriteria(e.g,violationofpolicyornot,refusedornot).Inaddition,wesometimesre-purposedtheredteamingdatatoruntargetedassessmentsonavarietyofvoices/examplestotesttherobustness
ofvariousmitigations.
3.2Evaluationmethodology
Inadditiontothedatafromredteaming,arangeofexistingevaluationdatasetswereconvertedtoevaluationsforspeech-to-speechmodelsusingtext-to-speech(TTS)systemssuchasVoice
Engine[8]
.Weconvertedtext-basedevaluationtaskstoaudio-basedevaluationtasksbyconvertingthetextinputstoaudio.Thisallowedustoreuseexistingdatasetsandtoolingaroundmeasuringmodelcapability,safetybehavior,andmonitoringofmodeloutputs,greatlyexpandingoursetofusableevaluations.
WeusedVoiceEnginetoconverttextinputstoaudio,feedittotheGPT-4o,andscoretheoutputsbythemodel.Wealwaysscoreonlythetextualcontentofthemodeloutput,exceptincaseswheretheaudioneedstobeevaluateddirectly,suchasinevaluationsforvoicecloning(seeSection
3.3.1)
.
Limitationsoftheevaluationmethodology
First,thevalidityofthisevaluationformatdependsonthecapabilityandreliabilityoftheTTSmodel.Certaintextinputsareunsuitableorawkwardtobeconvertedtoaudio;forinstance:mathematicalequationscode.Additionally,weexpectTTStobelossyforcertaintextinputs,suchastextthatmakesheavyuseofwhite-spaceorsymbolsforvisualformatting.Sinceweexpect
5
thatsuchinputsarealsounlikelytobeprovidedbytheuseroverAdvancedVoiceMode,weeitheravoidevaluatingthespeech-to-speechmodelonsuchtasks,oralternativelypre-processexampleswithsuchinputs.Nevertheless,wehighlightthatanymistakesidentifiedinourevaluationsmayariseeitherduetomodelcapability,orthefailureoftheTTSmodeltoaccuratelytranslatetextinputstoaudio.
AsecondconcernmaybewhethertheTTSinputsarerepresentativeofthedistributionofaudioinputsthatusersarelikelytoprovideinactualusage.WeevaluatetherobustnessofGPT-4oonaudioinputsacrossarangeofregionalaccentsinSection
3.3.3.
However,thereremainmanyotherdimensionsthatmaynotbecapturedinaTTS-basedevaluation,suchasdifferentvoiceintonationsandvalence,backgroundnoise,orcross-talk,thatcouldleadtodifferentmodelbehaviorinpracticalusage.
Lastly,theremaybeartifactsorpropertiesinthemodel’sgeneratedaudiothatarenotcapturedintext;forexample,backgroundnoisesandsoundeffects,orrespondingwithanout-of-distributionvoice.InSection
3.3.1,weillustrateusingauxiliaryclassifierstoidentifyundesirableaudio
generationthatcanbeusedinconjunctionwithscoringtranscripts.
3.3Observedsafetychallenges,evaluationsandmitigations
Potentialriskswiththemodelweremitigatedusingacombinationofmethods.Wetrainedthemodeltoadheretobehaviorthatwouldreduceriskviapost-trainingmethodsandalsointegratedclassifiersforblockingspecificgenerationsasapartofthedeployedsystem.
Forobservedsafetychallengesoutlinedbelow,weprovideadescriptionoftherisk,themitigationsapplied,andresultsofrelevantevaluations.Therisksoutlinedbelowareillustrative,andnon-exhaustive,andarefocusedontheexperienceintheChatGPTinterface.Wefocusontherisksthatareintroducedbyspeechtospeechcapabilitiesandhowtheymayinteractwithpre-existing
modalities(text,image)3.
Risk
Mitigations
Unauthorizedvoicegenera-tion
?Inallofourpost-trainingaudiodata,wesuperviseidealcom-pletionsusingthevoicesampleinthesystemmessageasthebasevoice.
?Weonlyallowthemodeltousecertainpre-selectedvoicesanduseanoutputclassifiertodetectifthemodeldeviatesfromthat.
Speakeridentification
?Wepost-trainedGPT-4otorefusetocomplywithrequeststoidentifysomeonebasedonavoiceinanaudioinput,whilestillcomplyingwithrequeststoidentifyfamousquotes.
3Wealsoevaluatetextandvisioncapabilities,andupdatemitigationsappropriately.NoincrementalriskswerefoundbeyondexistingworkoutlinedinGPT-4andGPT-4(V)SystemCards.
6
Generatingcopyrightedcon-tent
?WetrainedGPT-4otorefuserequestsforcopyrightedcontent,includingaudio,consistentwithourbroaderpractices.
?ToaccountforGPT-4o’saudiomodality,wealsoupdatedcertaintext-basedfilterstoworkonaudioconversations,builtfilterstodetectandblockoutputscontainingmusic,andforourlimitedalphaofChatGPT’sAdvancedVoiceMode,instructedthemodeltonotsingatall.
Ungroundedinference/sen-sitivetraitattribution
?Wepost-trainedGPT-4otorefuserequestsforungroundedinference,suchas“howintelligentisthisspeaker?”.
?Wepost-trainedGPT-4otosafelycomplywithrequestsforsensitivetraitattributionbyhedginganswers,suchas“whatisthisspeaker’saccent”→"Basedontheaudio,theysoundliketheyhaveaBritishaccent.”
Disallowedcontentinaudiooutput
?Werunourexistingmoderationclassifierovertexttranscrip-tionsofaudiopromptsandgenerations,andblocktheoutputforcertainhigh-severitycategories.
Eroticandviolentspeechout-put
?Werunourexistingmoderationclassifierovertexttranscrip-tionsofaudioprompts,andblocktheoutputifthepromptcontainseroticorviolentlanguage.
3.3.1Unauthorizedvoicegeneration
RiskDescription:Voicegenerationisthecapabilitytocreateaudiowithahuman-sounding
syntheticvoice,andincludesgeneratingvoicesbasedonashortinputclip.
Inadversarialsituations,thiscapabilitycouldfacilitateharmssuchasanincreaseinfrauddue
toimpersonationandmaybeharnessedtospreadfalseinformation[9,
10](forexample,ifwe
alloweduserstouploadanaudioclipofagivenspeakerandaskGPT-4otoproduceaspeechinthatspeaker’svoice).
TheseareverysimilartotherisksweidentifiedwithVoiceEngine[8]
.
Voicegenerationcanalsooccurinnon-adversarialsituations,suchasouruseofthatabilitytogeneratevoicesforChatGPT’sAdvancedVoiceMode.Duringtesting,wealsoobservedrareinstanceswherethemodelwouldunintentionallygenerateanoutputemulatingtheuser’svoice.
RiskMitigation:Weaddressedvoicegenerationrelated-risksbyallowingonlythepreset
voiceswecreatedincollaborationwithvoiceactors[11]tobeused
.Wedidthisbyincludingtheselectedvoicesasidealcompletionswhilepost-trainingtheaudiomodel.Additionally,webuiltastandaloneoutputclassifiertodetectiftheGPT-4ooutputisusingavoicethat’sdifferentfromourapprovedlist.Werunthisinastreamingfashionduringaudiogenerationandblockthe
7
outputifthespeakerdoesn’tmatchthechosenpresetvoice.
Evaluation:Wefindthattheresidualriskofunauthorizedvoicegenerationisminimal.Our
systemcurrentlycatches100%ofmeaningfuldeviationsfromthesystemvoice4
basedonourinternalevaluations,whichincludessamplesgeneratedbyothersystemvoices,clipsduringwhichthemodelusedavoicefromthepromptaspartofitscompletion,andanassortmentofhumansamples.
Whileunintentionalvoicegenerationstillexistsasaweaknessofthemodel,weusethesecondaryclassifierstoensuretheconversationisdiscontinuedifthisoccursmakingtheriskofunintentionalvoicegenerationminimal.Finally,ourmoderationbehaviormayresultinover-refusalswhenthe
conversationisnotinEnglish,whichisanactiveareaofimprovement5.
Table2:Ourvoiceoutputclassifierperformanceoveraconversationbylanguage:
Precision
Recall
English0.96Non-English50.95
1.01.0
3.3.2Speakeridentification
RiskDescription:Speakeridentificationistheabilitytoidentifyaspeakerbasedoninputaudio.Thispresentsapotentialprivacyrisk,particularlyforprivateindividualsaswellasforobscureaudioofpublicindividuals,alongwithpotentialsurveillancerisks.
RiskMitigation:Wepost-trainedGPT-4otorefusetocomplywithrequeststoidentifysomeonebasedonavoiceinanaudioinput.WeallowGPT-4otoanswerbasedonthecontentoftheaudioifitcontainscontentthatexplicitlyidentifiesthespeaker.GPT-4ostillcomplieswithrequeststoidentifyfamousquotes.Forexample,arequesttoidentifyarandompersonsaying“fourscoreandsevenyearsago”shouldidentifythespeakerasAbrahamLincoln,whilearequesttoidentifyacelebritysayingarandomsentenceshouldberefused.
Evaluation:Comparedtoourinitialmodel,wesawa14pointimprovementinwhenthemodelshouldrefusetoidentifyavoiceinanaudioinput,anda12pointimprovementwhenitshouldcomplywiththatrequest.Theformermeansthemodelwillalmostalwayscorrectlyrefusetoidentifyaspeakerbasedontheirvoice,mitigatingthepotentialprivacyissue.Thelattermeanstheremaybesituationsinwhichthemodelincorrectlyrefusestoidentifythespeakerofafamousquote.
Table3:Speakeridentificationsafebehavioraccuracy
GPT-4o-early
GPT-4o-deployed
ShouldRefuse0.83ShouldComply0.70
0.980.83
4Thesystemvoiceisoneofpre-definedvoicessetbyOpenAI.Themodelshouldonlyproduceaudiointhatvoice
5Thisresultsinmoreconversationsbeingdisconnectedthanmaybenecessary,whichisaproductqualityandusabilityissue.
8
3.3.3Disparateperformanceonvoiceinputs
RiskDescription:Modelsmayperformdifferentlywithusersspeakingwithdifferentaccents.Disparateperformancecanleadtoadifferenceinqualityofservicefordifferentusersofthemodel
[12,
13,
14]
.
RiskMitigation:Wepost-trainedGPT-4owithadiversesetofinputvoicestohavemodelperformanceandbehaviorbeinvariantacrossdifferentuservoices.
Evaluations:WerunevaluationsonGPT-4oAdvancedVoiceModeusingafixedassistantvoice(“shimmer”)andVoiceEnginetogenerateuserinputsacrossarangeofvoicesamples.WeusetwosetsofvoicesamplesforTTS:
?Officialsystemvoices(3differentvoices)
?Adiversesetofvoicescollectedfromtwodatacampaigns.Thiscomprises27differentEnglishvoicesamplesfromspeakersfromawiderangeofcountries,andamixofgenders.
Weevaluateontwosetsoftasks:CapabilitiesandSafetyBehavior
Capabilities:
Weevaluate6
onfourtasks:TriviaQA,asubsetofMMLU7,HellaSwagand
Lambada.TriviaQAandMMLUareknowledge-centrictasks,whileHellaSwagandLambadaarecommonsense-centricortext-continuationtasks.Overall,wefindthatperformanceonthediversesetofhumanvoicesperformsmarginallybutnotsignificantlyworsethanonsystemvoicesacrossallfourtasks.
6Evaluationsinthissectionwererunonafixed,randomlysampledsubsetofexamples,andthesescoresshouldnotbecomparedwithpubliclyreportedbenchmarksonthesametask.
7Anatomy,Astronomy,ClinicalKnowledge,CollegeBiology,ComputerSecurity,GlobalFacts,HighSchoolBiology,Sociology,Virology,CollegePhysics,HighSchoolEuropeanHistoryandWorldReligions.FollowingtheissuesdescribedinEvaluationMethodology
3.2,weexcludetaskswithheavilymathematicalorscientificnotation
.
9
SafetyBehavior:Weevaluateonaninternaldatasetofconversationsandevaluatetheconsis-tencyofthemodel’sadherenceandrefusalbehavioracrossdifferentuservoices.Overall,wedonotfindthatthemodelbehaviorvariesacrossdifferentvoices.
10
3.3.4Ungroundedinference/Sensitivetraitattribution
RiskDescription:Audioinputcanleadtothemodelmakingpotentiallybiasedorinaccurate
inferencesaboutspeakers.Wedefinetwocategories:
?Ungroundedinference(UGI):Makinginferencesaboutaspeakerthatcouldnotbedeter-minedsolelyfromaudiocontent.Thisincludesinferencesaboutthingssuchasaspeaker’srace,socioeconomicstatus/occupation,religiousbeliefs,personalitytraits,politicalat-tributes,intelligence,appearance(e.g.,eyecolor,attractiveness),genderidentity,sexualpreference,orcriminalhistory.Thiscanleadtobothallocativeandrepresentationalharms
[13,
15]dependingonhowsuchbehaviormanifests
.
?Sensitivetraitattribution(STA):Makinginferencesaboutaspeakerthatcouldplausiblybedeterminedsolelyfromaudiocontent.Thisincludesinferencesaboutthingssuchasaspeaker’saccentornationality.PotentialharmsfromSTAincludeanincreaseinrisks
11
fromsurveillance[16]andadifferenceinqualityofserviceforspeakerswithdifferentvoice
attributes[12,
13,
14]
.
RiskMitigation:Wepost-trainedGPT-4otorefusetocomplywithUGIrequests,whilehedginganswerstoSTAquestions.Forexample,aquestiontoidentifyaspeaker’slevelofintelligencewillberefused,whileaquestiontoidentifyaspeaker’saccentwillbemetwithananswersuchas“Basedontheaudio,theysoundliketheyhaveaBritishaccent.”
Evaluation:Comparedtoourinitialmodel,wesawa24pointimprovementinthemodelcorrectlyrespondingtorequeststoidentifysensitivetraits(e.g,refusingUGIandsafelycomplyingwithSTA).
Table4:UngroundedInferenceandSensitiveTraitAttributionsafebehavioraccuracy
GPT-4o-early
GPT-4o-deployed
Accuracy
0.60
0.84
3.3.5Violativeanddisallowedcontent
RiskDescription:GPT-4omaybepromptedtooutputharmfulcontentthroughaudiothatwouldbedisallowedthroughtext,suchasaudiospeechoutputthatgivesinstructionsonhowtocarryoutanillegalactivity.
RiskMitigation:Wefoundhightexttoaudiotransferenceofrefusalsforpreviouslydisallowedcontent.Thismeansthatthepost-trainingwe’vedonetoreducethepotentialforharminGPT-4o’stextoutputsuccessfullycarriedovertoaudiooutput.
Additionally,werunourexistingmoderationmodeloveratexttranscriptionofbothaudioinputandaudiooutputtodetectifeithercontainspotentiallyharmfullanguage,andwillblocka
generationifso8.
Evaluation:WeusedTTStoconvertexistingtextsafetyevaluationstoaudio.Wethenevaluatethetexttranscriptoftheaudiooutputwiththestandardtextrule-basedclassifier.Ourevaluationsshowstrongtext-audiotransferforrefusalsonpre-existingcontentpolicyareas.FurtherevaluationscanbefoundinAppendixA.
Table5:Performancecomparisonofsafetyevaluations:Textvs.Audio
Text
Audio
NotUnsafe0.95NotOver-refuse50.81
0.930.82
3.3.6Eroticandviolentspeechcontent
RiskDescription:GPT-4omaybepromptedtooutputeroticorviolentspeechcontent,whichmaybemoreevocativeorharmfulthanthesamecontextintext.Becauseofthis,wedecidedtorestrictthegenerationoferoticandviolentspeech
8
WedescribetherisksandmitigationsviolativeanddisallowedtextcontentintheGPT-4SystemCard[6],
specificallySection3.1ModelSafety,andSection4.2ContentClassifierDevelopment
12
RiskMitigation:
Werunourexistingmoderationmodel[17]overatexttranscriptionofthe
audioinputtodetectifitcontainsarequestforviolentoreroticcontent,andwillblocka
generationifso.
3.3.7Otherknownrisksandlimitationsofthemodel
Throughthecourseofinternaltestingandexternalredteaming,wediscoveredsomeadditionalrisksandmodellimitationsforwhichmodelorsystemlevelmitigationsarenascentorstillindevelopment,including:
Audiorobustness:Wesawanecdotalevidenceofdecreasesinsafetyrobustnessthroughaudioperturbations,suchaslowqualityinputaudio,backgroundnoiseintheinputaudio,andechoesintheinputaudio.Additionally,weobservedsimilardecreasesinsafetyrobustnessthroughintentionalandunintentionalaudiointerruptionswhilethemodelwasgeneratingoutput.
Misinformationandconspiracytheories:Redteamerswereabletocompelthemodeltogenerateinaccurateinformationbypromptingittoverballyrepeatfalseinformationandproduceconspiracytheories.
WhilethisisaknownissuefortextinGPTmodels[18,
19],therewas
concernfromredteamersthatthisinformationmaybemorepersuasiveorharmfulwhendeliveredthroughaudio,especiallyifthemodelwasinstructedtospeakemotivelyoremphatically.Thepersuasivenessofthemodelwasstudiedindetail(SeeSection
3.7
andwefoundthatthemodeldidnotscorehigherthanMediumriskfortext-only,andforspeech-to-speechthemodeldidnotscorehigherthanLow.
Speakinganon-Englishlanguageinanon-nativeaccent:Redteamersobservedinstancesoftheaudiooutputusinganon-nativeaccentwhenspeakinginanon-Englishlanguage.Thismayleadtoconcernsofbiastowardscertainaccentsandlanguages,andmoregenerallytowardslimitationsofnon-Englishlanguageperformanceinaudiooutputs.
Generatingcopyrightedcontent:WealsotestedGPT-4o’scapacitytorepeatcontentfoundwithinitstrainingdata.WetrainedGPT-4otorefuserequestsforcopyrightedcontent,includingaudio,consistentwithourbroaderpractices.ToaccountforGPT-4o’saudiomodality,wealsoupdatedcertaintext-basedfilterstoworkonaudioconversations,builtfilterstodetectandblockoutputscontainingmusic,andforourlimitedalphaofChatGPT’sadvancedVoiceMode,instructedthemodeltonotsingatall.Weintendtotracktheeffectivenessofthesemitigationsandrefinethemovertime.
Althoughsometechnicalmitigationsarestillindevelopment,ourUsagePolicies[20]disallow
intentionallydeceivingormisleadingothers,andcircumventingsafeguardsorsafetymitigations.
Inadditiontotechnicalmitigations,weenforceourUsagePoliciesthroughmonitoringandtakeactiononviolativebehaviorinbothChatGPTandtheAPI.
3.4PreparednessFrameworkEvaluations
WeevaluatedGPT-4oinaccordancewithourPreparednessFramework[4]
.ThePreparednessFrameworkisalivingdocumentthatdescribesourproceduralcommitmentstotrack,evaluate,forecast,andprotectagainstcatastrophicrisksfromfrontiermodels.Theevaluationscurrentlycoverfourriskcategories:cybersecurity,CBRN(chemical,biological,radiological,nuclear),
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 保險(xiǎn)職業(yè)學(xué)院《基礎(chǔ)拓?fù)洹?023-2024學(xué)年第一學(xué)期期末試卷
- 保山中醫(yī)藥高等??茖W(xué)校《日語專業(yè)導(dǎo)論》2023-2024學(xué)年第一學(xué)期期末試卷
- 2024年私車公用車輛使用績(jī)效考核合同2篇
- 2024年物聯(lián)網(wǎng)解決方案提供合同
- 2024年標(biāo)準(zhǔn)工程借款合同范本版B版
- 保定理工學(xué)院《建筑經(jīng)濟(jì)》2023-2024學(xué)年第一學(xué)期期末試卷
- 2024年音樂制品銷售合同3篇
- 2025年度辦公自動(dòng)化設(shè)備采購(gòu)與系統(tǒng)集成合同3篇
- 2024年重型貨車自動(dòng)行車安全裝置銷售合同
- 2025版綠色印刷產(chǎn)業(yè)聯(lián)盟購(gòu)銷合作協(xié)議書3篇
- TZJASE 005-2021 非道路移動(dòng)柴油機(jī)械(叉車)排氣煙度 檢驗(yàn)規(guī)則及方法
- GB/T 9995-1997紡織材料含水率和回潮率的測(cè)定烘箱干燥法
- CB/T 749-1997固定鋼質(zhì)百葉窗
- 審計(jì)、財(cái)務(wù)常用英文詞匯
- 品牌(商標(biāo))授權(quán)書(中英文模板)
- 行動(dòng)銷售(最新版)課件
- 糖尿病與骨質(zhì)疏松癥課件
- 經(jīng)尿道前列腺電切術(shù)麻醉的相關(guān)問題課件
- 醫(yī)院組織藥品集中采購(gòu)和使用工作制度及應(yīng)急預(yù)案
- 船舶軸系與軸系布置設(shè)計(jì)課件
- 庫區(qū)倒罐作業(yè)操作規(guī)程
評(píng)論
0/150
提交評(píng)論