Scene Text Recognition (STR) Has Emerged As A Challenging Task In Computer Vision Due To Variations In Font Styles, Illumination, Perspective Distortions, Occlusions, And Complex Backgrounds. Traditional Recognition Methods Often Struggle To Maintain Robustness In Such Unconstrained Environments. To Address These Challenges, We Propose A Novel Approach That Integrates Structure-guided Character Detection With Linguistic Knowledge Modeling For Improved Recognition Accuracy. The System First Employs A Structure-aware Character Detection Mechanism That Leverages Spatial Relationships Between Characters To Generate Reliable Candidate Regions, Reducing The Effect Of Noisy Backgrounds And Distortions. Subsequently, Linguistic Knowledge Is Incorporated Through Lexicon Constraints And Language Models To Refine Recognition Outputs And Enforce Semantic Consistency. This Joint Utilization Of Structural Cues And Linguistic Priors Enables The System To Not Only Detect Characters More Precisely But Also Correct Misclassifications In Ambiguous Scenarios. Experimental Results On Benchmark Scene Text Datasets Demonstrate That The Proposed Method Significantly Outperforms Conventional STR Approaches, Achieving Higher Accuracy And Robustness Under Real-world Conditions.
HUMAN ATTENTION PREDICTION IN NATURAL DAILY LIFE WITH FINE-GRAINED HUMAN-ENVIRONMENT-OBJECT INTERACTION MODEL
Copy-move Forgery Is One Of The Most Prevalent Image Tampering Techniques, Where A Region Of An Image Is Copied And Pasted Within The Same Image To Conceal Or Duplicate Content. Detecting Such Manipulations Is Highly Challenging Due To Post-processing Operations Such As Scaling, Rotation, And Compression. In This Work, We Propose A Novel Framework For Copy-move Forgery Detection That Integrates Deep PatchMatch With Pairwise Ranking Learning. The Deep PatchMatch Module Leverages Deep Feature Representations To Establish Reliable Correspondences Between Image Patches, Overcoming Limitations Of Handcrafted Descriptors. Subsequently, A Pairwise Ranking Learning Strategy Is Employed To Differentiate Authentic Patch Correspondences From Forged Ones, Enabling Robust Detection Even Under Complex Transformations. The Proposed Approach Achieves Precise Localization Of Forged Regions While Maintaining Resilience Against Common Post-processing Attacks. Extensive Experiments On Publicly Available Benchmark Datasets Demonstrate That Our Method Outperforms Existing State-of-the-art Techniques In Both Detection Accuracy And Localization Quality. This Work Highlights The Potential Of Combining Deep Patch Similarity Search With Learning-based Ranking For Advancing Image Forensics.
The Rapid Rise Of Video Content Has Sparked A Pressing Need For Robust Video-language Understanding (VLU) Systems Capable Of Linking Visual Dynamics With Natural Language Reasoning. However, Progress Is Constrained By What We Call The Impossible Data Trinity: The Simultaneous Requirement For Large-scale Data Quantity, Fine-grained Annotation Quality, And Broad Domain Diversity. Traditional Dataset Construction Approaches Struggle To Balance All Three Dimensions, Inevitably Sacrificing One To Achieve The Others. In This Work, We Introduce The Video Dataflywheel, A Self-reinforcing Paradigm That Leverages Multimodal Foundation Models, Synthetic Data Generation, And Human-in-the-loop Refinement To Progressively Resolve The Trinity Challenge. The Dataflywheel Operates In Iterative Cycles: Models Generate And Refine Pseudo-labeled Data, Which Is Then Validated And Expanded By Scalable Curation Strategies, Feeding Back Into Stronger Models And Richer Datasets. We Demonstrate How This Framework Accelerates The Creation Of Diverse, High-quality, And Large-scale Video-language Resources, Reducing Reliance On Costly Manual Annotation While Maintaining Semantic Fidelity. Beyond Dataset Construction, The Video Dataflywheel Provides A Blueprint For Sustainable VLU Research, Enabling The Community To Move Closer To Generalizable, Efficient, And Context-aware Video-language Understanding.
Biometric Authentication Has Become A Cornerstone Of Modern Security Systems Due To Its Uniqueness And Resistance To Forgery. However, Traditional Biometric Verification Methods Often Raise Concerns Regarding Privacy Leakage, Data Misuse, And Template Theft. This Work Proposes A Privacy-preserving Biometric Verification Framework Using Handwritten Random Digit Strings As An Authentication Factor. In The Proposed System, Users Are Prompted To Write A Randomly Generated Digit Sequence, Combining The Inherent Individuality Of Handwriting Dynamics With The Unpredictability Of Random Strings. This Approach Prevents Replay Attacks, Minimizes The Risk Of Stolen Static Templates, And Enhances Resilience Against Impersonation Attempts. The Verification Process Leverages Machine Learning-based Handwriting Recognition And Feature Extraction Techniques, While Privacy-preserving Transformations Ensure That The Raw Biometric Data Is Never Stored Or Transmitted In Its Original Form. Experimental Evaluation Demonstrates That The Method Achieves A Balance Between Robust Verification Accuracy, User Privacy, And System Security, Making It A Promising Solution For Next-generation Secure Authentication Systems.
This Project Presents A Python-based Application That Converts Text Embedded In Images Into Editable, Translatable Text And Delivers Fluent Outputs In A Target Language. The System Couples Image Preprocessing (noise Removal, Binarization, Skew Correction) With Optical Character Recognition (OCR) To Extract Text From Varied Inputs Such As Documents, Signboards, And Screenshots. Language Identification Triggers A Neural Machine Translation Pipeline To Produce The Translated Text, While Confidence Scores Guide Optional Human Review. A Lightweight GUI Enables Drag-and-drop Images, Batch Processing, And Export To TXT/PDF. The Implementation Leverages Open Libraries For Computer Vision And OCR, Supports On-device Processing For Privacy, And Can Fall Back To Online Translation Services For Higher Quality. Experiments On Multilingual Datasets Evaluate OCR Accuracy, Translation Quality (BLEU/chrF), And Latency Across Device Profiles. Results Show That Careful Preprocessing And Model Selection Substantially Improve End-to-end Quality, Making The Tool Practical For Education, Travel, And Accessibility Use Cases (e.g., Assisting Low-vision Users). The System’s Modular Design Facilitates Future Upgrades, Including Domain-specific Glossaries And Fully Offline Neural Translation.
The Aadhaar Card Is One Of The Most Widely Used Identification Documents In India, Containing Essential Demographic Details And A Unique 12-digit Identification Number. With The Rapid Digitalization Of Services, Automating The Extraction Of Aadhaar Card Information Has Become A Critical Requirement For Various Applications Such As E-KYC, Digital Onboarding, And Identity Verification. This Project Aims To Develop A Web-based Application That Extracts Aadhaar Card Details And The Profile Image Using Optical Character Recognition (OCR) And Haarcascade-based Face Detection Techniques. The System Utilizes OCR To Identify And Extract Textual Information Such As The Aadhaar Number, Name, Date Of Birth, Gender, And Address Directly From The Scanned Or Uploaded Aadhaar Card Image. Simultaneously, Haarcascade, A Machine Learning-based Object Detection Algorithm, Is Employed To Detect And Extract The Profile Image From The Card. The Extracted Details Are Then Structured And Displayed On The Webpage For Further Use In Authentication Or Record Management. This Approach Minimizes Manual Data Entry, Reduces Human Error, And Accelerates Digital Onboarding Processes. The Integration Of OCR And Haarcascade Ensures Efficient Extraction Of Both Textual And Image Components, Making The System Reliable, Scalable, And Suitable For Real-world Identity Verification Applications.
Stone Inscriptions Are One Of The Most Significant Sources Of Historical, Cultural, And Linguistic Knowledge, Especially For Understanding Ancient Civilizations. In Maharashtra, Stone Inscriptions Written In Early Forms Of The Marathi Script Provide Invaluable Insights Into Socio-political, Religious, And Cultural Developments Of Their Time. However, The Manual Study Of Such Inscriptions Is Challenging Due To Natural Weathering, Erosion, Script Variations, And The Complexity Of Ancient Writing Styles. In Recent Years, Optical Character Recognition (OCR) Has Emerged As A Powerful Tool To Digitize And Analyze Ancient Scripts, Enabling Automated Recognition And Preservation Of Historical Texts. This Paper Presents A Comprehensive Survey Of Existing Methods And Approaches For Ancient Marathi Script Recognition From Stone Inscriptions. The Study Reviews Key Preprocessing Techniques Such As Image Enhancement, Noise Reduction, And Segmentation; Feature Extraction Methods Including Structural, Statistical, And Deep Learning-based Approaches; And Classification Models Ranging From Traditional Machine Learning To Modern Convolutional Neural Networks. Challenges Such As Degraded Surfaces, Broken Characters, And Script Variability Are Discussed Alongside Potential Solutions. The Survey Also Highlights Future Research Directions, Particularly The Integration Of Deep Learning, Transfer Learning, And Multimodal Analysis For Improved Accuracy. By Compiling And Analyzing Existing Research, This Work Aims To Provide A Foundation For Developing Robust OCR Systems Tailored To Ancient Marathi Stone Inscriptions, Thereby Contributing To Digital Preservation And Historical Scholarship.
The Aim Of This Project Is To Develop A Web Application That Can Extract Text From An Image And Translate It To A Desired Language. The Application Is Built Using The Tesseract OCR Engine And The Flask Web Framework In Python. The Tesseract OCR Engine Is Used To Extract Text From The Image And The Flask Web Framework Is Used To Build The Web Application. The User Can Upload An Image Containing Text In Any Language. The Image Is Processed Using Tesseract OCR Engine To Extract The Text. The Extracted Text Is Then Translated To The Desired Language Using A Translation API. The Translated Text Is Displayed On The Web Page. The Application Also Provides An Option For The User To Select The Language They Want To Translate The Text To. The User Can Select The Desired Language From A Dropdown Menu On The Web Page. The Application Supports A Wide Range Of Languages For Translation. The Application Is Designed To Be User-friendly And Easy To Use. The User Interface Is Simple And Intuitive. The User Can Upload The Image And Select The Language They Want To Translate To With Just A Few Clicks. The Application Is Also Scalable And Can Handle Large Volumes Of Image And Text Data.
The Development Of Information Technology Has Been Increasingly Changing The Means Of Information Exchange Leading To The Need Of Digitizing Print Documents. In The Present Era, There Is A Lot Of Fraud That Often Occurs. For Example, Is Account Fraud, To Avoid Account Fraud There Was Verification Using ID Card Extraction Using OCR And NLP. Optical Character Recognition (OCR) Is A Technology That Used To Generate Text From Images. With OCR We Can Extract Aadhar Card Into Text Using Pytesseract. To Improve The Accuracy We Made Text Corrections Using Natural Language Processing (NLP) Basic Tools To Fixing The Text. With 5 Aadhar Card Image, We Compared The Performance With Three Different OCR Libraries. The Result Of Our Experiment Shows That Pytesseract Had The Best Performance.The Resultant Edge Image Contains The Broken Characters. To Fill These Gaps, We Apply The Dilation Operator That Increases The Thickness Of The Characters. Dilation Fills The Broken Characters, However, Also Add Extra Thickness That Is Then Removed Through Applying The Morphological Thinning. Finally, Dilation And Thinning Are Applied In Combination To Optical Character Recognition (OCR) To Segment And Recognize The Characters Including The Name, ID, DOB, Gender And Photo Of Person.
Exponential Growth Of Fake ID Cards Generation Leads To Increased Tendency Of Forgery With Severe Security And Privacy Threats. University ID Cards Are Used To Authenticate Actual Employees And Students Of The University. Manual Examination Of ID Cards Is A Laborious Activity, Therefore, In This Paper, We Propose An Effective Automated Method For Employee/student Authentication Based On Analyzing The Cards. Additionally, Our Method Also Identifies The Department Of Concerned Employee/student. For This Purpose, We Employ Different Image Enhancement And Morphological Operators To Improve The Appearance Of Input Image Better Suitable For Recognition. More Specifically, We Employ Median Filtering To Remove Noise From The Given Input Image.
Rigorous Research Has Been Done On Ancient Indian Script Character Recognition. Many Research Articles Are Published In Last Few Decades. Number Of OCR Techniques Is Available In Market, But OCR Techniques Are Not Useful For Ancient Script Recognition. But More Research Work Is Required To Recognize Ancient Marathi Scripts. This Paper Presents Different Techniques Which Are Published By Different Researchers To Recognize Ancient Scripts. Also Challenges In Recognition Of Ancient Marathi Scripts Are Discussed In This Paper.