How My Machine Can “See”

Categories Computer Science, Computer Vision0 Comments

Quick entry on a process of extracting geometry from images and also get my mind off things xP.

 Primitive Extraction:

Scan image per pixel (obviously compressed formats such as jpeg, png etc. don’t work here. RAW formats only). Set the thresholds to look for adjacent pixels that are in a controlled interval in their RGBA values. The latter can be tweaked. So what are the types of extractions?

–       Lines: RGBA(xn, yn) passes if RGBA(xn-1, yn-1) are in the same interval.

–       Triangles:  Builds upon Lines, where it scans adjacent pixels always looking for surrounding ones to set the Vertex of the triangle.  This can be costly as a search through pixels in one direction can yield to a “vertex-less” side. The complexity decreases per vertex discovered. So for an MxN pictures, the complexity would go MxN, (M-1)x(N-1), (M-LineLength, N-LineLength).

TriSearch() { RGBA(xn, yn) { k= 0… 2Ï€ if( RGBA(x + cos(k) * radius, y + sin(k) * radius)) TriSearch();  }

Not to get too wordy but the algorithms would have to scale to higher N-polygons. In more complex cases we can start testing for concavity. The algorithms can be further optimized by storing pre-coded primitives so we can skip bruteforce searching pixel by pixel (aka shape matching).

One last thing during this phase is to normalize the shape sizes to be in the x: [ -1, 1 ] and y:[ -1, 1 ].

Shape Matching

Once we’ve built our primitives we can start testing against a database that has already converted shapes into primitives. For example a pencil is a line, a table is a square with four lines and so on and so forth. This is potentially much faster than storing pixel data of pictures. That conversion process can be further optimized by reducing the number of vertices of the stored shapes so we can optimize the shape-matching process.

–       Function Match(Shape):


                  db.query(vertex_count == Shape.vertex_count).onFinish( (result_array ) {

                          loop( i = 0; i < result_array.length; ++i ) {

                        isFound = true;

                        loop(  j = 0; j < result_array[ i ].vertex_count; ++j )  {

                                if(  Shape.vertices[ j ] != result_array[ i ].vertices[ j ] )

                                   isFound = false;



                      if( isFound ) break;


                  } )


This step can compile a set of “primitives” since a lot of objects in the real world are more a group of primitives rather than only one. Search complexity will increase naturally, and there is little to no other solution but to throw more hardware on it to speed it up (server farms anyone?).

Closing Thoughts

 This is a very short brainstorm of a way of allowing the machine to see. Our optical neural network does a lot of shape matching, albeit at a cell level. Though shall not forget the multi-stateness of cells in comparison to the bi-states of computer bits. There is more work to be done on the hardware level than the algorithm/software level.