Marco Reis - Software Engineering

Strategy Design Pattern with Java Enum

by masreis
2020-10-042020-10-04

The Strategy is a design pattern that allows the software to chose one from a family of algorithms during the runtime. Each algorithm is implemented in its own class, which makes their clients interchangeable. Using the Strategy design pattern, a class can execute the same method in different ways, with different implementations. It is one of the patterns in the book Design Patterns by Gamma et al.

A Guide for Java Enum

by masreis
2020-10-042020-10-04

Enum is a special type in Java that allows for variables to use only values defined in a fixed and well-known list of values. Thus,… Read More »A Guide for Java Enum

Text Extraction and OCR With Apache Tika

by masreis
2020-05-162020-05-16
1 Comment

Apache Tika is a library for extracting text from most file formats, including PDF, DOC, and PPT. Tika has a simplified interface that extracts the content, making it easy to operate the library. Its main uses are related to the indexing process in search engines, content analysis (journalism, for example), and even translation (using paid APIs).

Extração de texto com Tika Server

by masreis
2020-05-032020-10-04
1 Comment

O Apache Tika é uma biblioteca para extração de texto da maioria dos formatos de arquivo, incluindo PDF, DOC e PPT. O Tika tem uma interface simplificada faz a extração do conteúdo, tornando-a uma biblioteca fácil de operar. Seus principais usos estão ligados ao processo de indexação em mecanismos de busca, análise de conteúdo (jornalismo, por exemplo) e até mesmo tradução (usando APIs pagas).

What is digital transformation

by masreis
2020-04-242020-05-03

Digital transformation is the process of incorporating new technologies into the business area of the company, changing the operation and product delivery to the clients.… Read More »What is digital transformation

Cassandra Bulk Loading (sstableloader)

by masreis
2019-02-152019-02-15

Introdução A bulk loading, ou bulk insert, é o processo no qual uma grande quantidade de registros é inserida em um banco de dados em… Read More »Cassandra Bulk Loading (sstableloader)

Plataforma de big data com Hadoop 3, Hive 3 e Spark 2.4

by masreis
2019-01-152019-01-19
3 Comments

O Apache Hadoop chegou na versão 3 trazendo novidades que eram esperadas há muito tempo. É claro que a instalação e configuração do ecossistema do… Read More »Plataforma de big data com Hadoop 3, Hive 3 e Spark 2.4

Lista de datasets para download

by masreis
2017-11-152018-09-29

Alguns datasets disponíveis para download que podem ser usados para estudar data science. http://dados.gov.br/ https://www.data.gov/ http://open.canada.ca/en https://data.gov.uk/ https://www.healthdata.gov/ http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml http://snap.stanford.edu/data/sx-stackoverflow.html https://archive.org/web/ https://index.okfn.org/dataset/ http://snap.stanford.edu/data/ https://github.com/caesar0301/awesome-public-datasets… Read More »Lista de datasets para download

Senha do MariaDB no Debian 9 Stretch

by masreis
2017-08-112017-08-15

Senha do MariaDB no Debian Stretch A nova versão estável do Debian, codinome Stretch, vem com o MariaDB como única variante do MySQL. A partir… Read More »Senha do MariaDB no Debian 9 Stretch