Action Recognition In Videos, Especially For Violence Detection, Is Now A Hot Topic In Computer Vision. The Interest Of This Task Is Related To The Multiplication Of Videos From Surveillance Cameras Or Live Television Content Producing Complex 2D + T Data. State-of-the-art Methods Rely On End-to-end Learning From 3D Neural Network Approaches That Should Be Trained With A Large Amount Of Data To Obtain Discriminating Features. To Face These Limitations, We Present In This Article A Method To Classify Videos For Violence Recognition Purpose, Byusingaclassical 2D Convolutional Neural Network(CNN). The Strategy Of The Method Is Two-fold: We Start By Building Several 2D Spatio-temporal Representations From An Input Video, The New Representations Are Considered To Feed The CNN To The Train/test Process. The Classification Decision Of The Video Is Carried Out By Aggregating The Individual Decisions From Its Different 2D Spatio-temporal Representations. An Experimental Study On Public Datasets Containing Violent Videos Highlights The Interest Of The Presented Method.